Tarantool development patches archive
 help / color / mirror / Atom feed
* [tarantool-patches] [PATCH 0/7] Expose ICU into Lua
@ 2018-04-25 23:29 Vladislav Shpilevoy
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 1/7] lua: expose ICU upper/lower functions to Lua Vladislav Shpilevoy
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-25 23:29 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Branch: http://github.com/tarantool/tarantool/tree/gh-3290-lua-icu-ucasemap
Issue: https://github.com/tarantool/tarantool/issues/3290
Issue: https://github.com/tarantool/tarantool/issues/3081

Lua can not work with unicode - in Lua it is enterpreted as a binary. On such
string built-in upper/lower functions, '#' (length) and comparison operators do
not work. But Tarantool links with ICU and has comparators with collations, that
can solve the problems.

But there is another issue - string methods must be available before box.cfg,
so the ICU and collations must be built out of main 'box' static library. To do
this the collation related files are moved from 'box' into 'core' library.

A second issue is that when box.cfg is called, it inserts built-in collations
into _collation space, and these insertions can conflict with built-in
collations, created before box.cfg. Delete from _collation can break the
collations cache as well. The patchset solves this by checking collations
deletions and insertions, and if they tries to operate on built-in collations,
then they are ignored - a user sees changes in _collation, but the cache is
unchanged.

Vladislav Shpilevoy (7):
  lua: expose ICU upper/lower functions to Lua
  lua: implement string.u_count
  alter: fix assertion in collations alter
  Move struct on_access_denied_ctx into error.h
  Merge box_error, stat and collations into core library
  Always store built-in collations in the cache
  lua: expose u_compare/u_icompare into Lua

 cmake/module.cmake           |   4 +-
 extra/exports                |   3 +
 src/CMakeLists.txt           |  17 +++-
 src/box/CMakeLists.txt       |  18 +---
 src/box/alter.cc             | 129 ++++++++++++++----------
 src/box/lua/call.c           |   2 +-
 src/box/lua/error.cc         |   2 +-
 src/box/lua/net_box.c        |   2 +-
 src/box/lua/tuple.c          |   2 +-
 src/box/lua/xlog.c           |   2 +-
 src/box/schema.h             |  12 ---
 src/{box => }/coll.c         |   0
 src/{box => }/coll.h         |   0
 src/{box => }/coll_cache.c   |  34 +++++++
 src/{box => }/coll_cache.h   |   0
 src/{box => }/coll_def.c     |   7 ++
 src/{box => }/coll_def.h     |  13 +++
 src/{box => }/errcode.c      |   0
 src/{box => }/errcode.h      |   0
 src/{box => }/error.cc       |   1 -
 src/{box => }/error.h        |  12 +++
 src/lua/init.c               |   2 +-
 src/lua/string.lua           | 227 +++++++++++++++++++++++++++++++++++++++++++
 src/main.cc                  |   2 +-
 src/{box => }/opt_def.c      |   0
 src/{box => }/opt_def.h      |   0
 src/util.c                   |  79 ++++++++++++++-
 test/app-tap/string.test.lua |  70 ++++++++++++-
 test/box/ddl.result          |  25 +++++
 test/box/ddl.test.lua        |  11 +++
 test/unit/CMakeLists.txt     |   8 +-
 test/unit/coll.cpp           |   3 +-
 32 files changed, 585 insertions(+), 102 deletions(-)
 rename src/{box => }/coll.c (100%)
 rename src/{box => }/coll.h (100%)
 rename src/{box => }/coll_cache.c (75%)
 rename src/{box => }/coll_cache.h (100%)
 rename src/{box => }/coll_def.c (96%)
 rename src/{box => }/coll_def.h (94%)
 rename src/{box => }/errcode.c (100%)
 rename src/{box => }/errcode.h (100%)
 rename src/{box => }/error.cc (99%)
 rename src/{box => }/error.h (95%)
 rename src/{box => }/opt_def.c (100%)
 rename src/{box => }/opt_def.h (100%)

-- 
2.15.1 (Apple Git-101)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] [PATCH 1/7] lua: expose ICU upper/lower functions to Lua
  2018-04-25 23:29 [tarantool-patches] [PATCH 0/7] Expose ICU into Lua Vladislav Shpilevoy
@ 2018-04-25 23:29 ` Vladislav Shpilevoy
  2018-04-28  0:56   ` [tarantool-patches] " Alexander Turenko
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 2/7] lua: implement string.u_count Vladislav Shpilevoy
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-25 23:29 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Lua can not work with unicode - in Lua it is enterpreted as a
binary. On such string built-in upper/lower functions do not
work. But Tarantool links with ICU that can solve the problem.
Lets expose ICU upper/lower function into Lua to enable correct
case transformations.

Closes #3290
---
 src/lua/init.c               |   2 +-
 src/lua/string.lua           | 140 +++++++++++++++++++++++++++++++++++++++++++
 test/app-tap/string.test.lua |  34 ++++++++++-
 3 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/src/lua/init.c b/src/lua/init.c
index a0a7f63f6..9149362a0 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -124,9 +124,9 @@ static const char *lua_modules[] = {
 	"errno", errno_lua,
 	"fiber", fiber_lua,
 	"env", env_lua,
-	"string", string_lua,
 	"table", table_lua,
 	"buffer", buffer_lua,
+	"string", string_lua,
 	"msgpackffi", msgpackffi_lua,
 	"crypto", crypto_lua,
 	"digest", digest_lua,
diff --git a/src/lua/string.lua b/src/lua/string.lua
index 5ff64c9f6..1c7226143 100644
--- a/src/lua/string.lua
+++ b/src/lua/string.lua
@@ -1,4 +1,5 @@
 local ffi = require('ffi')
+local buffer = require('buffer')
 
 ffi.cdef[[
     const char *
@@ -6,6 +7,28 @@ ffi.cdef[[
            const char *needle,   size_t needle_len);
     int memcmp(const char *mem1, const char *mem2, size_t num);
     int isspace(int c);
+
+    typedef struct UCaseMap UCaseMap;
+    typedef int UErrorCode;
+
+    int32_t
+    ucasemap_utf8ToLower(const UCaseMap *csm, char *dest, int32_t destCapacity,
+                         const char *src, int32_t srcLength,
+                         UErrorCode *pErrorCode);
+
+    int32_t
+    ucasemap_utf8ToUpper(const UCaseMap *csm, char *dest, int32_t destCapacity,
+                         const char *src, int32_t srcLength,
+                         UErrorCode *pErrorCode);
+
+    UCaseMap *
+    ucasemap_open(const char *locale, uint32_t options, UErrorCode *pErrorCode);
+
+    void
+    ucasemap_close(UCaseMap *csm);
+
+    const char *
+    u_errorName(UErrorCode code);
 ]]
 
 local c_char_ptr = ffi.typeof('const char *')
@@ -313,6 +336,121 @@ local function string_rstrip(inp)
     return (string.gsub(inp, "(.-)%s*$", "%1"))
 end
 
+--
+-- ICU bindings.
+--
+--
+-- Ucasemap cache allows to do not create a new UCaseMap on each
+-- u_upper/u_lower call. References are weak to do not keep all
+-- ever created maps, so the cache is cleared periodically.
+--
+local ucasemap_cache = setmetatable({}, {__mode = 'v'})
+local errcode = ffi.new('int[1]')
+errcode[0] = 0
+--
+-- ICU UCaseMethod requires 0 error code as input, so after any
+-- error the errcode must be nullified.
+--
+local function icu_clear_error()
+    errcode[0] = 0
+end
+--
+-- String representation of the latest ICU error.
+--
+local function icu_error()
+    return ffi.string(ffi.C.u_errorName(errcode[0]))
+end
+--
+-- Find cached UCaseMap for @a locale, or create a new one and
+-- cache it.
+-- @param locale String locale or box.NULL for default.
+-- @retval nil Can neither get or create a UCaseMap.
+-- @retval not nil Needed UCaseMap.
+--
+local function ucasemap_retrieve(locale)
+    local ret = ucasemap_cache[locale]
+    if not ret then
+        ret = ffi.C.ucasemap_open(c_char_ptr(locale), 0, errcode)
+        if ret ~= nil then
+            ffi.gc(ret, ffi.C.ucasemap_close)
+            ucasemap_cache[locale] = ret
+        end
+    end
+    return ret
+end
+--
+-- Check ICU options for string.u_upper/u_lower.
+-- @param opts Options. Can contain only one option - locale.
+-- @param usage_err What to throw if opts types are violated.
+-- @retval String locale if found.
+-- @retval box.NULL if locale is not found.
+--
+local function icu_check_case_opts(opts, usage_err)
+    if opts then
+        if type(opts) ~= 'table' then
+            error(usage_err)
+        end
+        if opts.locale then
+            if type(opts.locale) ~= 'string' then
+                error(usage_err)
+            end
+            return opts.locale
+        end
+    end
+    return box.NULL
+end
+--
+-- Create upper/lower case version of @an inp string.
+-- @param inp Input string.
+-- @param opts Options. Can contain only one option - locale. In
+--        different locales different capital letters can exist
+--        for the same symbol. For example, in turkish locale
+--        upper('i') == 'İ', in english locale it is 'I'. See ICU
+--        documentation for locales.
+-- @param func Upper or lower FFI function.
+-- @param usage What to print on usage error.
+-- @retval nil, error Error.
+-- @retval not nil Uppercase version of @an inp.
+--
+local function string_u_to_case_impl(inp, opts, func, usage)
+    if type(inp) ~= 'string' then
+        error(usage)
+    end
+    icu_clear_error()
+    local map = ucasemap_retrieve(icu_check_case_opts(opts, usage))
+    if not map then
+        return nil, icu_error()
+    end
+    local src_len = #inp
+    inp = c_char_ptr(inp)
+    local buf = buffer.IBUF_SHARED
+    local buf_raw, ret
+    -- +1 for NULL termination. Else error appears in errcode.
+    local dst_len = src_len + 1
+::do_convert::
+    buf:reset()
+    buf_raw = buf:alloc(dst_len)
+    ret = func(map, buf_raw, dst_len, inp, src_len, errcode)
+    if ret <= dst_len then
+        if ret == 0 and errcode[0] ~= 0 then
+            return nil, icu_error()
+        end
+        return ffi.string(buf_raw, ret)
+    else
+        dst_len = ret + 1
+        goto do_convert
+    end
+end
+
+local function string_u_upper(inp, opts)
+    local usage = 'Usage: string.u_upper(str, {[locale = <string>}])'
+    return string_u_to_case_impl(inp, opts, ffi.C.ucasemap_utf8ToUpper, usage)
+end
+
+local function string_u_lower(inp, opts)
+    local usage = 'Usage: string.u_lower(str, {[locale = <string>}])'
+    return string_u_to_case_impl(inp, opts, ffi.C.ucasemap_utf8ToLower, usage)
+end
 
 -- It'll automatically set string methods, too.
 local string = require('string')
@@ -326,3 +464,5 @@ string.hex        = string_hex
 string.strip      = string_strip
 string.lstrip      = string_lstrip
 string.rstrip      = string_rstrip
+string.u_upper    = string_u_upper
+string.u_lower    = string_u_lower
diff --git a/test/app-tap/string.test.lua b/test/app-tap/string.test.lua
index 852a7923c..004e149e9 100755
--- a/test/app-tap/string.test.lua
+++ b/test/app-tap/string.test.lua
@@ -3,7 +3,7 @@
 local tap = require('tap')
 local test = tap.test("string extensions")
 
-test:plan(5)
+test:plan(6)
 
 test:test("split", function(test)
     test:plan(10)
@@ -114,6 +114,38 @@ test:test("hex", function(test)
     test:is(string.hex(""), "", "hex empty string")
 end)
 
+test:test("unicode", function(test)
+    test:plan(12)
+    local str = 'хеЛлоу вОрЛд ё Ё я Я э Э ъ Ъ hElLo WorLd 1234 i I İ 勺#☢༺'
+    local upper_res = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 I I İ 勺#☢༺'
+    local upper_turkish = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 İ I İ 勺#☢༺'
+    local lower_res = 'хеллоу ворлд ё ё я я э э ъ ъ hello world 1234 i i i̇ 勺#☢༺'
+    local lower_turkish = 'хеллоу ворлд ё ё я я э э ъ ъ hello world 1234 i ı i 勺#☢༺'
+    local s = string.u_upper(str)
+    test:is(s, upper_res, 'default locale upper')
+    s = string.u_lower(str)
+    test:is(s, lower_res, 'default locale lower')
+    s = string.u_upper(str, {locale = 'en_US'})
+    test:is(s, upper_res, 'en_US locale upper')
+    s = string.u_lower(str, {locale = 'en_US'})
+    test:is(s, lower_res, 'en_US locale lower')
+    s = string.u_upper(str, {locale = 'ru_RU'})
+    test:is(s, upper_res, 'ru_RU locale upper')
+    s = string.u_lower(str, {locale = 'ru_RU'})
+    test:is(s, lower_res, 'ru_RU locale lower')
+    s = string.u_upper(str, {locale = 'tr_TR'})
+    test:is(s, upper_turkish, 'tr_TR locale upper')
+    s = string.u_lower(str, {locale = 'tr_TR'})
+    test:is(s, lower_turkish, 'tr_TR locale lower')
+    local err
+    s, err = string.u_upper(str, {locale = 'not_existing locale tratatatata'})
+    test:is(s, upper_res, 'incorrect locale turns into default upper')
+    test:isnil(err, 'upper error is nil')
+    s, err = string.u_lower(str, {locale = 'not_existing locale tratatatata'})
+    test:is(s, lower_res, 'incorrect locale turns into default lower')
+    test:isnil(err, 'lower error is nil')
+end)
+
 test:test("strip", function(test)
     test:plan(6)
     local str = "  hello hello "
-- 
2.15.1 (Apple Git-101)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] [PATCH 2/7] lua: implement string.u_count
  2018-04-25 23:29 [tarantool-patches] [PATCH 0/7] Expose ICU into Lua Vladislav Shpilevoy
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 1/7] lua: expose ICU upper/lower functions to Lua Vladislav Shpilevoy
@ 2018-04-25 23:29 ` Vladislav Shpilevoy
  2018-04-26 10:36   ` [tarantool-patches] " Vladislav Shpilevoy
                     ` (3 more replies)
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 3/7] alter: fix assertion in collations alter Vladislav Shpilevoy
                   ` (5 subsequent siblings)
  7 siblings, 4 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-25 23:29 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Lua can not calculate length of a unicode string correctly. But
Tarantool has ICU on board - lets use it to calculate length.

u_count has options, that allows to count only symbols of a
specific class, for example, only capital letters, or digits.
Options can be combined.

Closes #3081
---
 extra/exports                |  1 +
 src/CMakeLists.txt           |  1 +
 src/lua/string.lua           | 52 ++++++++++++++++++++++++++++++++++++++++++++
 src/util.c                   | 48 +++++++++++++++++++++++++++++++++++++++-
 test/app-tap/string.test.lua | 22 ++++++++++++++++++-
 5 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/extra/exports b/extra/exports
index a274bb23b..b0480fe79 100644
--- a/extra/exports
+++ b/extra/exports
@@ -40,6 +40,7 @@ title_set_status
 title_get_status
 exception_get_string
 exception_get_int
+u_count
 
 tarantool_lua_ibuf
 uuid_nil
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 8ab09e968..f489c88cf 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -110,6 +110,7 @@ target_link_libraries(core
     ${LIBEIO_LIBRARIES}
     ${LIBCORO_LIBRARIES}
     ${MSGPUCK_LIBRARIES}
+    ${ICU_LIBRARIES}
 )
 
 add_library(stat STATIC rmean.c latency.c histogram.c)
diff --git a/src/lua/string.lua b/src/lua/string.lua
index 1c7226143..6c566cb54 100644
--- a/src/lua/string.lua
+++ b/src/lua/string.lua
@@ -29,6 +29,9 @@ ffi.cdef[[
 
     const char *
     u_errorName(UErrorCode code);
+
+    int
+    u_count(const char *s, int bsize, uint8_t flags);
 ]]
 
 local c_char_ptr = ffi.typeof('const char *')
@@ -452,6 +455,54 @@ local function string_u_lower(inp, opts)
     return string_u_to_case_impl(inp, opts, ffi.C.ucasemap_utf8ToLower, usage)
 end
 
+local U_COUNT_CLASS_ALL = 0
+local U_COUNT_CLASS_UPPER_LETTER = 1
+local U_COUNT_CLASS_LOWER_LETTER = 2
+local U_COUNT_CLASS_DIGIT = 4
+
+--
+-- Calculate count of symbols matching the needed classes.
+-- @param inp Input UTF8 string.
+-- @param opts Options with needed classes. It supports 'all',
+--        'upper', 'lower', 'digit'. Opts is a table, where needed
+--        class key is set to true. By default all classes are
+--        needed, and count works like strlen (not bsize, like Lua
+--        operator '#').
+-- @retval not nil Summary count of needed symbols.
+-- @retval nil, position Invalid UTF8 on returned position.
+--
+local function string_u_count(inp, opts)
+    local usage = 'Usage: string.u_count(str)'
+    if type(inp) ~= 'string' then
+        error(usage)
+    end
+    local flags = 0
+    if opts then
+        if type(opts) ~= 'table' then
+            error(usage)
+        end
+        if not opts.all then
+            if opts.upper then
+                flags = bit.bor(flags, U_COUNT_CLASS_UPPER_LETTER)
+            end
+            if opts.lower then
+                flags = bit.bor(flags, U_COUNT_CLASS_LOWER_LETTER)
+            end
+            if opts.digit then
+                flags = bit.bor(flags, U_COUNT_CLASS_DIGIT)
+            end
+        end
+    end
+    local len = #inp
+    inp = c_char_ptr(inp)
+    local ret = ffi.C.u_count(inp, len, flags)
+    if ret >= 0 then
+        return ret
+    else
+        return nil, -ret
+    end
+end
+
 -- It'll automatically set string methods, too.
 local string = require('string')
 string.split      = string_split
@@ -466,3 +517,4 @@ string.lstrip      = string_lstrip
 string.rstrip      = string_rstrip
 string.u_upper    = string_u_upper
 string.u_lower    = string_u_lower
+string.u_count    = string_u_count
diff --git a/src/util.c b/src/util.c
index 9458695b9..c117dee05 100644
--- a/src/util.c
+++ b/src/util.c
@@ -40,7 +40,8 @@
 #include <time.h>
 #include <unistd.h>
 #include <limits.h>
-
+#include <unicode/utf8.h>
+#include <unicode/uchar.h>
 #include <msgpuck/msgpuck.h> /* mp_char2escape[] table */
 
 #include "say.h"
@@ -321,3 +322,48 @@ fpconv_check()
 	 */
 	assert(buf[1] == '.');
 }
+
+enum u_count_class {
+	U_COUNT_CLASS_ALL = 0,
+	U_COUNT_CLASS_UPPER_LETTER = 1,
+	U_COUNT_CLASS_LOWER_LETTER = 2,
+	U_COUNT_CLASS_DIGIT = 4,
+};
+
+/**
+ * Get length of a UTF8 string.
+ * @param s UTF8 string.
+ * @param bsize Binary size of @an s.
+ * @param flags Binary OR of u_count_class flags.
+ * @retval >=0 Count of symbols matched one of @a flags.
+ * @retval  <0 Invalid UTF8 on the position -1 * returned value.
+ */
+int
+u_count(const char *s, int bsize, uint8_t flags)
+{
+	int offset = 0;
+	int len = 0;
+	UChar32 c;
+	if (flags == 0) {
+		/* Fast path - just calculate strlen. */
+		while (offset < bsize) {
+			U8_NEXT(s, offset, bsize, c);
+			if (c == U_SENTINEL)
+				return -(len + 1);
+			++len;
+		}
+		return len;
+	}
+	/* Slow path - must check each symbol to match flags. */
+	while (offset < bsize) {
+		U8_NEXT(s, offset, bsize, c);
+		if (c == U_SENTINEL)
+			return -(len + 1);
+		uint8_t f = 0;
+		f |= (flags & U_COUNT_CLASS_UPPER_LETTER) != 0 && u_isupper(c);
+		f |= (flags & U_COUNT_CLASS_LOWER_LETTER) != 0 && u_islower(c);
+		f |= (flags & U_COUNT_CLASS_DIGIT) != 0 && u_isdigit(c);
+		len += f != 0 ? 1 : 0;
+	}
+	return len;
+}
diff --git a/test/app-tap/string.test.lua b/test/app-tap/string.test.lua
index 004e149e9..650a5982d 100755
--- a/test/app-tap/string.test.lua
+++ b/test/app-tap/string.test.lua
@@ -115,7 +115,7 @@ test:test("hex", function(test)
 end)
 
 test:test("unicode", function(test)
-    test:plan(12)
+    test:plan(24)
     local str = 'хеЛлоу вОрЛд ё Ё я Я э Э ъ Ъ hElLo WorLd 1234 i I İ 勺#☢༺'
     local upper_res = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 I I İ 勺#☢༺'
     local upper_turkish = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 İ I İ 勺#☢༺'
@@ -144,6 +144,26 @@ test:test("unicode", function(test)
     s, err = string.u_lower(str, {locale = 'not_existing locale tratatatata'})
     test:is(s, lower_res, 'incorrect locale turns into default lower')
     test:isnil(err, 'lower error is nil')
+
+    -- Test u_count.
+    test:is(string.u_count(str), 56, 'u_count works')
+    s, err = string.u_count("\xE2\x80\xE2")
+    test:is(err, 1, 'u_count checks for errors')
+    test:isnil(s, 'retval is nil on error')
+    test:is(string.u_count(''), 0, 'u_count works on empty strings')
+    s, err = pcall(string.u_count, 100)
+    test:isnt(err:find('Usage'), nil, 'usage is checked')
+    -- Test different symbol classes.
+    s, err = pcall(string.u_count, str, 1234)
+    test:isnt(err:find('Usage'), nil, 'usage checks options')
+    test:is(string.u_count(str, {all = true}), 56, 'option all')
+    test:is(string.u_count(str, {upper = true}), 13, 'option upper')
+    test:is(string.u_count(str, {lower = true}), 19, 'option lower')
+    test:is(string.u_count(str, {upper = true, lower = true}), 32,
+            'options upper and lower')
+    test:is(string.u_count(str, {digit = true}), 4, 'option digit')
+    test:is(string.u_count(str, {digit = true, upper = true}), 17,
+            'options digit and upper')
 end)
 
 test:test("strip", function(test)
-- 
2.15.1 (Apple Git-101)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] [PATCH 3/7] alter: fix assertion in collations alter
  2018-04-25 23:29 [tarantool-patches] [PATCH 0/7] Expose ICU into Lua Vladislav Shpilevoy
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 1/7] lua: expose ICU upper/lower functions to Lua Vladislav Shpilevoy
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 2/7] lua: implement string.u_count Vladislav Shpilevoy
@ 2018-04-25 23:29 ` Vladislav Shpilevoy
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 4/7] Move struct on_access_denied_ctx into error.h Vladislav Shpilevoy
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-25 23:29 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Forbid collations update - it leads to an assertion in collation
cache about unavailability to correctly replace collations.
Refactor collations deletion and insertion.

Besides, collation update potentially changes indexing order with
no actual indexes rebuilding.
---
 src/box/alter.cc      | 115 ++++++++++++++++++++++++++------------------------
 test/box/ddl.result   |  10 +++++
 test/box/ddl.test.lua |   3 ++
 3 files changed, 74 insertions(+), 54 deletions(-)

diff --git a/src/box/alter.cc b/src/box/alter.cc
index 77b03e680..be2ccc061 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -2376,19 +2376,31 @@ coll_cache_rollback(struct trigger *trigger, void *event)
 	struct txn_stmt *stmt = txn_last_stmt((struct txn*) event);
 	struct tuple *new_tuple = stmt->new_tuple;
 
-	if (new_tuple != NULL) {
+	if (new_tuple == NULL && old_coll != NULL) {
+		/* Rollback DELETE. */
+		if (coll_by_id(old_coll->id) == old_coll) {
+			/*
+			 * Nothing to do - the tx had been failed
+			 * before old collation deletion.
+			 */
+			return;
+		}
+		struct coll *replaced;
+		if (coll_cache_replace(old_coll, &replaced) != 0) {
+			panic("Out of memory on insertion into collation "\
+			      "cache");
+		}
+		assert(replaced == NULL);
+	} else {
+		assert(new_tuple != NULL && old_coll == NULL);
 		uint32_t new_id = tuple_field_u32_xc(new_tuple,
 						     BOX_COLLATION_FIELD_ID);
 		struct coll *new_coll = coll_by_id(new_id);
-		coll_cache_delete(new_coll);
-		coll_delete(new_coll);
-	}
-
-	if (old_coll != NULL) {
-		struct coll *replaced;
-		int rc = coll_cache_replace(old_coll, &replaced);
-		assert(rc == 0 && replaced == NULL);
-		(void)rc;
+		/* Can be NULL, if failed before cache update. */
+		if (new_coll != NULL) {
+			coll_cache_delete(new_coll);
+			coll_delete(new_coll);
+		}
 	}
 }
 
@@ -2412,59 +2424,54 @@ on_replace_dd_collation(struct trigger * /* trigger */, void *event)
 	struct tuple *old_tuple = stmt->old_tuple;
 	struct tuple *new_tuple = stmt->new_tuple;
 	txn_check_singlestatement_xc(txn, "Space _collation");
-	struct coll *old_coll = NULL;
-	if (old_tuple != NULL) {
+	if (new_tuple == NULL && old_tuple != NULL) {
+		/* DELETE */
 		/* TODO: Check that no index uses the collation */
-		uint32_t old_id = tuple_field_u32_xc(old_tuple,
-						     BOX_COLLATION_FIELD_ID);
-		old_coll = coll_by_id(old_id);
+		int32_t old_id = tuple_field_u32_xc(old_tuple,
+						    BOX_COLLATION_FIELD_ID);
+		struct coll *old_coll = coll_by_id(old_id);
 		assert(old_coll != NULL);
 		access_check_ddl(old_coll->name, old_coll->owner_id,
-				 SC_COLLATION,
-				 new_tuple == NULL ? PRIV_D: PRIV_A,
-				 false);
-
-		struct trigger *on_commit =
-			txn_alter_trigger_new(coll_cache_delete_coll, old_coll);
-		txn_on_commit(txn, on_commit);
-	}
-
-	if (new_tuple == NULL) {
-		/* Simple DELETE */
-		assert(old_tuple != NULL);
-		coll_cache_delete(old_coll);
-
+				 SC_COLLATION, PRIV_D, false);
+		/*
+		 * Set on_rollback before deletion from the cache.
+		 * Else rollback setting can fail and the
+		 * collation will be lost.
+		 */
 		struct trigger *on_rollback =
 			txn_alter_trigger_new(coll_cache_rollback, old_coll);
 		txn_on_rollback(txn, on_rollback);
-		return;
-	}
-
-	struct coll_def new_def;
-	coll_def_new_from_tuple(new_tuple, &new_def);
-	access_check_ddl(new_def.name, new_def.owner_id, SC_COLLATION,
-			 old_tuple == NULL ? PRIV_C : PRIV_A, false);
-	struct coll *new_coll = coll_new(&new_def);
-	if (new_coll == NULL)
-		diag_raise();
-	auto def_guard = make_scoped_guard([=] { coll_delete(new_coll); });
-
-	struct coll *replaced;
-	if (coll_cache_replace(new_coll, &replaced) != 0)
-		diag_raise();
-	if (replaced == NULL && old_coll != NULL) {
+		coll_cache_delete(old_coll);
+		struct trigger *on_commit =
+			txn_alter_trigger_new(coll_cache_delete_coll, old_coll);
+		txn_on_commit(txn, on_commit);
+	} else if (new_tuple != NULL && old_tuple == NULL) {
+		/* INSERT */
+		struct coll_def new_def;
+		coll_def_new_from_tuple(new_tuple, &new_def);
+		access_check_ddl(new_def.name, new_def.owner_id, SC_COLLATION,
+				 PRIV_C, false);
 		/*
-		 * ID of a collation was changed.
-		 * Remove collation by old ID.
+		 * Set on_rollback before the collation is
+		 * inserted into the cache. Else if the trigger
+		 * creation fails the collation will not be
+		 * deleted on abort.
 		 */
-		coll_cache_delete(old_coll);
+		struct trigger *on_rollback =
+			txn_alter_trigger_new(coll_cache_rollback, NULL);
+		txn_on_rollback(txn, on_rollback);
+		struct coll *new_coll = coll_new(&new_def);
+		if (new_coll == NULL)
+			diag_raise();
+		struct coll *replaced;
+		if (coll_cache_replace(new_coll, &replaced) != 0)
+			diag_raise();
+		assert(replaced == NULL);
+	} else {
+		/* UPDATE */
+		assert(new_tuple != NULL && old_tuple != NULL);
+		tnt_raise(ClientError, ER_UNSUPPORTED, "collation", "alter");
 	}
-
-	struct trigger *on_rollback =
-		txn_alter_trigger_new(coll_cache_rollback, old_coll);
-	txn_on_rollback(txn, on_rollback);
-
-	def_guard.is_active = false;
 }
 
 /**
diff --git a/test/box/ddl.result b/test/box/ddl.result
index 0eef37992..87b9581c6 100644
--- a/test/box/ddl.result
+++ b/test/box/ddl.result
@@ -289,6 +289,16 @@ box.internal.collation.create('test', 'ICU', 'ru_RU', {strength='primary'}) --ok
 box.internal.collation.drop('test') --ok
 ---
 ...
+c = box.space._collation:get{1}:totable()
+---
+...
+c[2] = 'unicode_test'
+---
+...
+box.space._collation:replace(c)
+---
+- error: collation does not support alter
+...
 box.begin() box.internal.collation.create('test2', 'ICU', 'ru_RU')
 ---
 - error: Space _collation does not support multi-statement transactions
diff --git a/test/box/ddl.test.lua b/test/box/ddl.test.lua
index 820fe7d4d..a1502ae13 100644
--- a/test/box/ddl.test.lua
+++ b/test/box/ddl.test.lua
@@ -127,6 +127,9 @@ box.internal.collation.create('test', 'ICU', 'ru_RU', {strength=2}) --ok
 box.internal.collation.drop('test') --ok
 box.internal.collation.create('test', 'ICU', 'ru_RU', {strength='primary'}) --ok
 box.internal.collation.drop('test') --ok
+c = box.space._collation:get{1}:totable()
+c[2] = 'unicode_test'
+box.space._collation:replace(c)
 
 box.begin() box.internal.collation.create('test2', 'ICU', 'ru_RU')
 box.rollback()
-- 
2.15.1 (Apple Git-101)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] [PATCH 4/7] Move struct on_access_denied_ctx into error.h
  2018-04-25 23:29 [tarantool-patches] [PATCH 0/7] Expose ICU into Lua Vladislav Shpilevoy
                   ` (2 preceding siblings ...)
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 3/7] alter: fix assertion in collations alter Vladislav Shpilevoy
@ 2018-04-25 23:29 ` Vladislav Shpilevoy
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 5/7] Merge box_error, stat and collations into core library Vladislav Shpilevoy
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-25 23:29 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

The issue #3290 was not only about upper/lower Lua functions, but
about unicode comparison functions too. Actually, the issue
requests upper/lower exactly to do string comparison, that can be
done more quick with no garbage strings creation. For this
Tarantool collations can be used.

To be able to expose collations into Lua, the coll.h/.c,
coll_def.h/.c and coll_cache.h/.c must be moved from 'box' static
library into 'core' static library so that they will be built
together with string utils. But they require 'stat' and
'box_error' libraries. The patch prepares the files going to be
moved, so in the next patch they are just moved, with no changes.
It saves commit history.
---
 src/box/error.cc |  1 -
 src/box/error.h  | 12 ++++++++++++
 src/box/schema.h | 12 ------------
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/src/box/error.cc b/src/box/error.cc
index 99f519537..4bab5d720 100644
--- a/src/box/error.cc
+++ b/src/box/error.cc
@@ -174,7 +174,6 @@ BuildXlogError(const char *file, unsigned line, const char *format, ...)
 	}
 }
 
-#include "schema.h"
 #include "trigger.h"
 
 struct rlist on_access_denied = RLIST_HEAD_INITIALIZER(on_access_denied);
diff --git a/src/box/error.h b/src/box/error.h
index c791e6c6a..69a1022ad 100644
--- a/src/box/error.h
+++ b/src/box/error.h
@@ -133,6 +133,18 @@ extern const struct type_info type_ClientError;
 extern const struct type_info type_XlogError;
 extern const struct type_info type_AccessDeniedError;
 
+/**
+ * Context passed to on_access_denied trigger.
+ */
+struct on_access_denied_ctx {
+	/** Type of declined access */
+	const char *access_type;
+	/** Type of object the required access was denied to */
+	const char *object_type;
+	/** Name of object the required access was denied to */
+	const char *object_name;
+};
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #include "exception.h"
diff --git a/src/box/schema.h b/src/box/schema.h
index 56f39b3fe..4b2a3da4e 100644
--- a/src/box/schema.h
+++ b/src/box/schema.h
@@ -224,16 +224,4 @@ extern struct rlist on_alter_sequence;
  */
 extern struct rlist on_access_denied;
 
-/**
- * Context passed to on_access_denied trigger.
- */
-struct on_access_denied_ctx {
-	/** Type of declined access */
-	const char *access_type;
-	/** Type of object the required access was denied to */
-	const char *object_type;
-	/** Name of object the required access was denied to */
-	const char *object_name;
-};
-
 #endif /* INCLUDES_TARANTOOL_BOX_SCHEMA_H */
-- 
2.15.1 (Apple Git-101)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] [PATCH 5/7] Merge box_error, stat and collations into core library
  2018-04-25 23:29 [tarantool-patches] [PATCH 0/7] Expose ICU into Lua Vladislav Shpilevoy
                   ` (3 preceding siblings ...)
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 4/7] Move struct on_access_denied_ctx into error.h Vladislav Shpilevoy
@ 2018-04-25 23:29 ` Vladislav Shpilevoy
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 6/7] Always store built-in collations in the cache Vladislav Shpilevoy
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-25 23:29 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

The goal is to expose collations into Lua with no dependencies on
box library. But collations merge into core requires box_error
and stat libraries too.
---
 cmake/module.cmake         |  4 ++--
 src/CMakeLists.txt         | 14 ++++++++++----
 src/box/CMakeLists.txt     | 18 +++++-------------
 src/box/lua/call.c         |  2 +-
 src/box/lua/error.cc       |  2 +-
 src/box/lua/net_box.c      |  2 +-
 src/box/lua/tuple.c        |  2 +-
 src/box/lua/xlog.c         |  2 +-
 src/{box => }/coll.c       |  0
 src/{box => }/coll.h       |  0
 src/{box => }/coll_cache.c |  0
 src/{box => }/coll_cache.h |  0
 src/{box => }/coll_def.c   |  0
 src/{box => }/coll_def.h   |  0
 src/{box => }/errcode.c    |  0
 src/{box => }/errcode.h    |  0
 src/{box => }/error.cc     |  0
 src/{box => }/error.h      |  0
 src/main.cc                |  2 +-
 src/{box => }/opt_def.c    |  0
 src/{box => }/opt_def.h    |  0
 test/unit/CMakeLists.txt   |  8 ++++----
 test/unit/coll.cpp         |  3 +--
 23 files changed, 28 insertions(+), 31 deletions(-)
 rename src/{box => }/coll.c (100%)
 rename src/{box => }/coll.h (100%)
 rename src/{box => }/coll_cache.c (100%)
 rename src/{box => }/coll_cache.h (100%)
 rename src/{box => }/coll_def.c (100%)
 rename src/{box => }/coll_def.h (100%)
 rename src/{box => }/errcode.c (100%)
 rename src/{box => }/errcode.h (100%)
 rename src/{box => }/error.cc (100%)
 rename src/{box => }/error.h (100%)
 rename src/{box => }/opt_def.c (100%)
 rename src/{box => }/opt_def.h (100%)

diff --git a/cmake/module.cmake b/cmake/module.cmake
index 988dcb94d..434938cdf 100644
--- a/cmake/module.cmake
+++ b/cmake/module.cmake
@@ -25,7 +25,7 @@ function(rebuild_module_api)
         COMMAND ${CMAKE_C_COMPILER}
             ${cflags}
             -I ${CMAKE_SOURCE_DIR}/src -I ${CMAKE_BINARY_DIR}/src
-            -E ${CMAKE_SOURCE_DIR}/src/box/errcode.h > ${errcodefile}
+            -E ${CMAKE_SOURCE_DIR}/src/errcode.h > ${errcodefile}
         COMMAND
             grep "enum box_error_code" ${errcodefile} >> ${tmpfile}
         COMMAND cat ${CMAKE_CURRENT_SOURCE_DIR}/module_footer.h >> ${tmpfile}
@@ -34,7 +34,7 @@ function(rebuild_module_api)
         DEPENDS ${CMAKE_SOURCE_DIR}/extra/apigen
                 ${CMAKE_CURRENT_SOURCE_DIR}/module_header.h
                 ${CMAKE_CURRENT_SOURCE_DIR}/module_footer.h
-                ${CMAKE_SOURCE_DIR}/src/box/errcode.h
+                ${CMAKE_SOURCE_DIR}/src/errcode.h
                 ${headers}
         )
 
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index f489c88cf..1032edc57 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -94,6 +94,15 @@ set (core_sources
      random.c
      trigger.cc
      http_parser.c
+     error.cc
+     errcode.c
+     rmean.c
+     latency.c
+     histogram.c
+     coll_def.c
+     coll.c
+     coll_cache.c
+     opt_def.c
  )
 
 if (TARGET_OS_NETBSD)
@@ -113,9 +122,6 @@ target_link_libraries(core
     ${ICU_LIBRARIES}
 )
 
-add_library(stat STATIC rmean.c latency.c histogram.c)
-target_link_libraries(stat core)
-
 if (ENABLE_BACKTRACE AND NOT TARGET_OS_DARWIN)
 	target_link_libraries(core gcc_s ${UNWIND_LIBRARIES})
 endif()
@@ -191,7 +197,7 @@ set(api_headers
     ${CMAKE_SOURCE_DIR}/src/box/box.h
     ${CMAKE_SOURCE_DIR}/src/box/index.h
     ${CMAKE_SOURCE_DIR}/src/box/iterator_type.h
-    ${CMAKE_SOURCE_DIR}/src/box/error.h
+    ${CMAKE_SOURCE_DIR}/src/error.h
     ${CMAKE_SOURCE_DIR}/src/box/lua/call.h
     ${CMAKE_SOURCE_DIR}/src/box/lua/tuple.h
     ${CMAKE_SOURCE_DIR}/src/latch.h
diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
index ae156a2f5..30d636e47 100644
--- a/src/box/CMakeLists.txt
+++ b/src/box/CMakeLists.txt
@@ -21,15 +21,12 @@ set_property(DIRECTORY PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${lua_sources})
 
 include_directories(${ZSTD_INCLUDE_DIRS})
 
-add_library(box_error STATIC error.cc errcode.c)
-target_link_libraries(box_error core stat)
-
 add_library(vclock STATIC vclock.c)
 target_link_libraries(vclock core)
 
 add_library(xrow STATIC xrow.c iproto_constants.c)
-target_link_libraries(xrow server core small vclock misc box_error
-                      scramble ${MSGPUCK_LIBRARIES})
+target_link_libraries(xrow server core small vclock misc scramble
+                      ${MSGPUCK_LIBRARIES})
 
 add_library(tuple STATIC
     tuple.c
@@ -41,20 +38,15 @@ add_library(tuple STATIC
     tuple_bloom.c
     tuple_dictionary.c
     key_def.cc
-    coll_def.c
-    coll.c
-    coll_cache.c
     field_def.c
-    opt_def.c
 )
-target_link_libraries(tuple json_path box_error core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit)
+target_link_libraries(tuple json_path core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit)
 
 add_library(xlog STATIC xlog.c)
-target_link_libraries(xlog core box_error crc32 ${ZSTD_LIBRARIES})
+target_link_libraries(xlog core crc32 ${ZSTD_LIBRARIES})
 
 add_library(box STATIC
     iproto.cc
-    error.cc
     xrow_io.cc
     tuple_convert.c
     identifier.c
@@ -132,6 +124,6 @@ add_library(box STATIC
     lua/xlog.c
     ${bin_sources})
 
-target_link_libraries(box box_error tuple stat xrow xlog vclock crc32 scramble
+target_link_libraries(box tuple xrow xlog vclock crc32 scramble
                       ${common_libraries})
 add_dependencies(box build_bundled_libs)
diff --git a/src/box/lua/call.c b/src/box/lua/call.c
index be13812aa..b60c6c397 100644
--- a/src/box/lua/call.c
+++ b/src/box/lua/call.c
@@ -30,7 +30,7 @@
  */
 #include "box/lua/call.h"
 #include "box/call.h"
-#include "box/error.h"
+#include <error.h>
 #include "fiber.h"
 
 #include "lua/utils.h"
diff --git a/src/box/lua/error.cc b/src/box/lua/error.cc
index 314907421..960ea2aa9 100644
--- a/src/box/lua/error.cc
+++ b/src/box/lua/error.cc
@@ -40,7 +40,7 @@ extern "C" {
 #include <errinj.h>
 
 #include "lua/utils.h"
-#include "box/error.h"
+#include "src/error.h"
 
 static int
 luaT_error_raise(lua_State *L)
diff --git a/src/box/lua/net_box.c b/src/box/lua/net_box.c
index db2d2dbb4..f5cc8d674 100644
--- a/src/box/lua/net_box.c
+++ b/src/box/lua/net_box.c
@@ -43,7 +43,7 @@
 #include "third_party/base64.h"
 
 #include "coio.h"
-#include "box/errcode.h"
+#include "errcode.h"
 #include "lua/fiber.h"
 
 #define cfg luaL_msgpack_default
diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c
index 67abb32bd..889f0569b 100644
--- a/src/box/lua/tuple.c
+++ b/src/box/lua/tuple.c
@@ -39,7 +39,7 @@
 
 #include "box/tuple.h"
 #include "box/tuple_convert.h"
-#include "box/errcode.h"
+#include "errcode.h"
 #include "box/memtx_tuple.h"
 #include "json/path.h"
 
diff --git a/src/box/lua/xlog.c b/src/box/lua/xlog.c
index 030f5c2d4..c53b5a21a 100644
--- a/src/box/lua/xlog.c
+++ b/src/box/lua/xlog.c
@@ -35,7 +35,7 @@
 #include <diag.h>
 #include <msgpuck/msgpuck.h>
 
-#include <box/error.h>
+#include "error.h"
 #include <box/xlog.h>
 #include <box/xrow.h>
 #include <box/iproto_constants.h>
diff --git a/src/box/coll.c b/src/coll.c
similarity index 100%
rename from src/box/coll.c
rename to src/coll.c
diff --git a/src/box/coll.h b/src/coll.h
similarity index 100%
rename from src/box/coll.h
rename to src/coll.h
diff --git a/src/box/coll_cache.c b/src/coll_cache.c
similarity index 100%
rename from src/box/coll_cache.c
rename to src/coll_cache.c
diff --git a/src/box/coll_cache.h b/src/coll_cache.h
similarity index 100%
rename from src/box/coll_cache.h
rename to src/coll_cache.h
diff --git a/src/box/coll_def.c b/src/coll_def.c
similarity index 100%
rename from src/box/coll_def.c
rename to src/coll_def.c
diff --git a/src/box/coll_def.h b/src/coll_def.h
similarity index 100%
rename from src/box/coll_def.h
rename to src/coll_def.h
diff --git a/src/box/errcode.c b/src/errcode.c
similarity index 100%
rename from src/box/errcode.c
rename to src/errcode.c
diff --git a/src/box/errcode.h b/src/errcode.h
similarity index 100%
rename from src/box/errcode.h
rename to src/errcode.h
diff --git a/src/box/error.cc b/src/error.cc
similarity index 100%
rename from src/box/error.cc
rename to src/error.cc
diff --git a/src/box/error.h b/src/error.h
similarity index 100%
rename from src/box/error.h
rename to src/error.h
diff --git a/src/main.cc b/src/main.cc
index 1682baea0..3d61e3c51 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -63,7 +63,7 @@
 #include "tt_pthread.h"
 #include "lua/init.h"
 #include "box/box.h"
-#include "box/error.h"
+#include "error.h"
 #include "scoped_guard.h"
 #include "random.h"
 #include "tt_uuid.h"
diff --git a/src/box/opt_def.c b/src/opt_def.c
similarity index 100%
rename from src/box/opt_def.c
rename to src/opt_def.c
diff --git a/src/box/opt_def.h b/src/opt_def.h
similarity index 100%
rename from src/box/opt_def.h
rename to src/opt_def.h
diff --git a/test/unit/CMakeLists.txt b/test/unit/CMakeLists.txt
index 049edf641..8792a3ed2 100644
--- a/test/unit/CMakeLists.txt
+++ b/test/unit/CMakeLists.txt
@@ -90,10 +90,10 @@ add_executable(fiber_channel_stress.test fiber_channel_stress.cc)
 target_link_libraries(fiber_channel_stress.test core)
 
 add_executable(cbus_stress.test cbus_stress.c)
-target_link_libraries(cbus_stress.test core stat)
+target_link_libraries(cbus_stress.test core)
 
 add_executable(cbus.test cbus.c)
-target_link_libraries(cbus.test core unit stat)
+target_link_libraries(cbus.test core unit)
 
 add_executable(coio.test coio.cc)
 target_link_libraries(coio.test core eio bit uri unit)
@@ -133,9 +133,9 @@ add_executable(json_path.test json_path.c)
 target_link_libraries(json_path.test json_path unit ${ICU_LIBRARIES})
 
 add_executable(rmean.test rmean.cc)
-target_link_libraries(rmean.test stat unit)
+target_link_libraries(rmean.test core unit)
 add_executable(histogram.test histogram.c)
-target_link_libraries(histogram.test stat unit)
+target_link_libraries(histogram.test core unit)
 
 add_executable(say.test say.c)
 target_link_libraries(say.test core unit)
diff --git a/test/unit/coll.cpp b/test/unit/coll.cpp
index e1643c9c0..acf69b030 100644
--- a/test/unit/coll.cpp
+++ b/test/unit/coll.cpp
@@ -1,9 +1,8 @@
-#include "box/coll.h"
+#include "coll.h"
 #include <iostream>
 #include <vector>
 #include <algorithm>
 #include <string.h>
-#include <box/coll_def.h>
 #include <assert.h>
 #include <msgpuck.h>
 #include <diag.h>
-- 
2.15.1 (Apple Git-101)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] [PATCH 6/7] Always store built-in collations in the cache
  2018-04-25 23:29 [tarantool-patches] [PATCH 0/7] Expose ICU into Lua Vladislav Shpilevoy
                   ` (4 preceding siblings ...)
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 5/7] Merge box_error, stat and collations into core library Vladislav Shpilevoy
@ 2018-04-25 23:29 ` Vladislav Shpilevoy
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 7/7] lua: expose u_compare/u_icompare into Lua Vladislav Shpilevoy
  2018-04-28  1:55 ` [tarantool-patches] Re: [PATCH 0/7] Expose ICU " Alexander Turenko
  7 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-25 23:29 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

To be able to use collations out of box.cfg it is needed to be
sure, that they are available always. Else it can break non-box
collation usage. So when a built-in collation is deleted from
_collation (for example, on box.internal.bootstrap()), it is not
deleted from the cache. When a built-in collation is inserted
into _collation, it does not touch the cache (it already contains
all built-in collations).
---
 src/box/alter.cc | 14 ++++++++++++++
 src/coll_cache.c | 34 ++++++++++++++++++++++++++++++++++
 src/coll_def.c   |  7 +++++++
 src/coll_def.h   | 13 +++++++++++++
 4 files changed, 68 insertions(+)

diff --git a/src/box/alter.cc b/src/box/alter.cc
index be2ccc061..39fcc8de4 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -2429,6 +2429,13 @@ on_replace_dd_collation(struct trigger * /* trigger */, void *event)
 		/* TODO: Check that no index uses the collation */
 		int32_t old_id = tuple_field_u32_xc(old_tuple,
 						    BOX_COLLATION_FIELD_ID);
+		if (coll_is_system(old_id)) {
+			/*
+			 * Built-in collations can be deleted from
+			 * _collation, but not from the cache.
+			 */
+			return;
+		}
 		struct coll *old_coll = coll_by_id(old_id);
 		assert(old_coll != NULL);
 		access_check_ddl(old_coll->name, old_coll->owner_id,
@@ -2451,6 +2458,13 @@ on_replace_dd_collation(struct trigger * /* trigger */, void *event)
 		coll_def_new_from_tuple(new_tuple, &new_def);
 		access_check_ddl(new_def.name, new_def.owner_id, SC_COLLATION,
 				 PRIV_C, false);
+		if (coll_is_system(new_def.id)) {
+			/*
+			 * Built-in collation is in the cache
+			 * already.
+			 */
+			return;
+		}
 		/*
 		 * Set on_rollback before the collation is
 		 * inserted into the cache. Else if the trigger
diff --git a/src/coll_cache.c b/src/coll_cache.c
index b7eb3edb9..b7fce5aba 100644
--- a/src/coll_cache.c
+++ b/src/coll_cache.c
@@ -45,7 +45,41 @@ coll_cache_init()
 			 "coll_cache_id");
 		return -1;
 	}
+	struct coll *replaced, *unicode_coll, *unicode_ci_coll;
+	struct coll_def def;
+	memset(&def, 0, sizeof(def));
+	def.id = COLLATION_ID_UNICODE;
+	def.owner_id = 1;
+	def.name = "unicode";
+	def.name_len = strlen(def.name);
+	def.locale = "";
+	def.type = COLL_TYPE_ICU;
+	unicode_coll = coll_new(&def);
+	if (unicode_coll == NULL)
+		goto err_unicode;
+	if (coll_cache_replace(unicode_coll, &replaced) != 0)
+		goto err_unicode_replace;
+	assert(replaced == NULL);
+
+	def.id = COLLATION_ID_UNICODE_CI;
+	def.name = "unicode_ci";
+	def.name_len = strlen(def.name);
+	def.icu.strength = COLL_ICU_STRENGTH_PRIMARY;
+	unicode_ci_coll = coll_new(&def);
+	if (unicode_ci_coll == NULL)
+		goto err_unicode_ci;
+	if (coll_cache_replace(unicode_ci_coll, &replaced) != 0)
+		goto err_unicode_ci_replace;
 	return 0;
+err_unicode_ci_replace:
+	coll_delete(unicode_ci_coll);
+err_unicode_ci:
+	coll_cache_delete(unicode_coll);
+err_unicode_replace:
+	coll_delete(unicode_coll);
+err_unicode:
+	mh_i32ptr_delete(coll_cache_id);
+	return -1;
 }
 
 /** Delete global hash tables. */
diff --git a/src/coll_def.c b/src/coll_def.c
index f849845b3..0a0ead9a1 100644
--- a/src/coll_def.c
+++ b/src/coll_def.c
@@ -63,6 +63,13 @@ const char *coll_icu_strength_strs[] = {
 	"IDENTICAL"
 };
 
+bool
+coll_is_system(int coll_id)
+{
+	return coll_id == COLLATION_ID_UNICODE ||
+	       coll_id == COLLATION_ID_UNICODE_CI;
+}
+
 static int64_t
 icu_on_off_from_str(const char *str, uint32_t len)
 {
diff --git a/src/coll_def.h b/src/coll_def.h
index 7a1027a1e..e86c546e5 100644
--- a/src/coll_def.h
+++ b/src/coll_def.h
@@ -49,6 +49,19 @@ enum coll_type {
 
 extern const char *coll_type_strs[];
 
+/** Built-in collations. */
+enum {
+	COLLATION_ID_UNICODE = 1,
+	COLLATION_ID_UNICODE_CI = 2,
+};
+
+/**
+ * Check if a collation is system by its ID. System collations can
+ * not be deleted.
+ */
+bool
+coll_is_system(int coll_id);
+
 /*
  * ICU collation options. See
  * http://icu-project.org/apiref/icu4c/ucol_8h.html#a583fbe7fc4a850e2fcc692e766d2826c
-- 
2.15.1 (Apple Git-101)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] [PATCH 7/7] lua: expose u_compare/u_icompare into Lua
  2018-04-25 23:29 [tarantool-patches] [PATCH 0/7] Expose ICU into Lua Vladislav Shpilevoy
                   ` (5 preceding siblings ...)
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 6/7] Always store built-in collations in the cache Vladislav Shpilevoy
@ 2018-04-25 23:29 ` Vladislav Shpilevoy
  2018-04-28  1:55 ` [tarantool-patches] Re: [PATCH 0/7] Expose ICU " Alexander Turenko
  7 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-25 23:29 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Lua has no built-in way to correctly compare unicode strings. But
Tarantool links with ICU, so lets expose its collators into Lua.
They are now out of box, and can be used in common libraries.

Follow up #3290
---
 extra/exports                |  2 ++
 src/CMakeLists.txt           |  2 +-
 src/lua/string.lua           | 35 +++++++++++++++++++++++++++++++++++
 src/util.c                   | 31 +++++++++++++++++++++++++++++++
 test/app-tap/string.test.lua | 18 +++++++++++++++++-
 test/box/ddl.result          | 15 +++++++++++++++
 test/box/ddl.test.lua        |  8 ++++++++
 7 files changed, 109 insertions(+), 2 deletions(-)

diff --git a/extra/exports b/extra/exports
index b0480fe79..efcc3011c 100644
--- a/extra/exports
+++ b/extra/exports
@@ -41,6 +41,8 @@ title_get_status
 exception_get_string
 exception_get_int
 u_count
+u_compare
+u_icompare
 
 tarantool_lua_ibuf
 uuid_nil
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 1032edc57..0ca41cfaf 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -114,7 +114,7 @@ endif ()
 
 add_library(core STATIC ${core_sources})
 target_link_libraries(core
-    salad small pthread
+    salad small pthread misc
     ${LIBEV_LIBRARIES}
     ${LIBEIO_LIBRARIES}
     ${LIBCORO_LIBRARIES}
diff --git a/src/lua/string.lua b/src/lua/string.lua
index 6c566cb54..ce12c3f5d 100644
--- a/src/lua/string.lua
+++ b/src/lua/string.lua
@@ -32,6 +32,12 @@ ffi.cdef[[
 
     int
     u_count(const char *s, int bsize, uint8_t flags);
+
+    int
+    u_compare(const char *s1, size_t len1, const char *s2, size_t len2);
+
+    int
+    u_icompare(const char *s1, size_t len1, const char *s2, size_t len2);
 ]]
 
 local c_char_ptr = ffi.typeof('const char *')
@@ -503,6 +509,33 @@ local function string_u_count(inp, opts)
     end
 end
 
+--
+-- Compare two UTF8 strings.
+-- @param inp1 First string.
+-- @param inp2 Second string.
+-- @param func Comparator - case sensitive or insensitive.
+-- @param usage Error on incorrect usage.
+-- @retval  <0 inp1 < inp2
+-- @retval  >0 inp1 > inp2
+-- @retval ==0 inp1 == inp2
+--
+local function string_u_compare_impl(inp1, inp2, func, usage)
+    if type(inp1) ~= 'string' or type(inp2) ~= 'string' then
+        error(usage)
+    end
+    return func(c_char_ptr(inp1), #inp1, c_char_ptr(inp2), #inp2)
+end
+
+local function string_u_compare(inp1, inp2)
+    return string_u_compare_impl(inp1, inp2, ffi.C.u_compare,
+                                 'Usage: string.u_compare(<string>, <string>)')
+end
+
+local function string_u_icompare(inp1, inp2)
+    return string_u_compare_impl(inp1, inp2, ffi.C.u_icompare,
+                                 'Usage: string.u_icompare(<string>, <string>)')
+end
+
 -- It'll automatically set string methods, too.
 local string = require('string')
 string.split      = string_split
@@ -518,3 +551,5 @@ string.rstrip      = string_rstrip
 string.u_upper    = string_u_upper
 string.u_lower    = string_u_lower
 string.u_count    = string_u_count
+string.u_compare  = string_u_compare
+string.u_icompare = string_u_icompare
diff --git a/src/util.c b/src/util.c
index c117dee05..0f4d89b71 100644
--- a/src/util.c
+++ b/src/util.c
@@ -45,6 +45,7 @@
 #include <msgpuck/msgpuck.h> /* mp_char2escape[] table */
 
 #include "say.h"
+#include "coll_cache.h"
 
 /** Find a string in an array of strings.
  *
@@ -367,3 +368,33 @@ u_count(const char *s, int bsize, uint8_t flags)
 	}
 	return len;
 }
+
+/**
+ * Compare two UTF8 strings.
+ * @param s1 First string.
+ * @param len1 Binary size of @a s1.
+ * @param s2 Second string.
+ * @param len2 Binary size of @a s2.
+ * @retval Same as strcmp.
+ */
+int
+u_compare(const char *s1, size_t len1, const char *s2, size_t len2)
+{
+	struct coll *coll = coll_by_id(COLLATION_ID_UNICODE);
+	return coll->cmp(s1, len1, s2, len2, coll);
+}
+
+/**
+ * Case insensitive compare two UTF8 strings.
+ * @param s1 First string.
+ * @param len1 Binary size of @a s1.
+ * @param s2 Second string.
+ * @param len2 Binary size of @a s2.
+ * @retval Same as strcmp.
+ */
+int
+u_icompare(const char *s1, size_t len1, const char *s2, size_t len2)
+{
+	struct coll *coll = coll_by_id(COLLATION_ID_UNICODE_CI);
+	return coll->cmp(s1, len1, s2, len2, coll);
+}
diff --git a/test/app-tap/string.test.lua b/test/app-tap/string.test.lua
index 650a5982d..f357304a0 100755
--- a/test/app-tap/string.test.lua
+++ b/test/app-tap/string.test.lua
@@ -115,7 +115,7 @@ test:test("hex", function(test)
 end)
 
 test:test("unicode", function(test)
-    test:plan(24)
+    test:plan(37)
     local str = 'хеЛлоу вОрЛд ё Ё я Я э Э ъ Ъ hElLo WorLd 1234 i I İ 勺#☢༺'
     local upper_res = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 I I İ 勺#☢༺'
     local upper_turkish = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 İ I İ 勺#☢༺'
@@ -164,6 +164,22 @@ test:test("unicode", function(test)
     test:is(string.u_count(str, {digit = true}), 4, 'option digit')
     test:is(string.u_count(str, {digit = true, upper = true}), 17,
             'options digit and upper')
+    -- Test compare.
+    local s1 = '☢'
+    local s2 = 'İ'
+    test:is(s1 < s2, false, 'test binary cmp')
+    test:is(string.u_compare(s1, s2) < 0, true, 'test unicode <')
+    test:is(string.u_compare(s1, s1) == 0, true, 'test unicode eq')
+    test:is(string.u_compare(s2, s1) > 0, true, 'test unicode >')
+    test:is(string.u_icompare('a', 'A') == 0, true, 'test icase ==')
+    test:is(string.u_icompare('b', 'A') > 0, true, 'test icase >, first')
+    test:is(string.u_icompare('B', 'a') > 0, true, 'test icase >, second >')
+    test:is(string.u_compare('', '') == 0, true, 'test empty compare')
+    test:is(string.u_compare('', 'a') < 0, true, 'test left empty compare')
+    test:is(string.u_compare('a', '') > 0, true, 'test right empty compare')
+    test:is(string.u_icompare('', '') == 0, true, 'test empty icompare')
+    test:is(string.u_icompare('', 'a') < 0, true, 'test left empty icompare')
+    test:is(string.u_icompare('a', '') > 0, true, 'test right empty icompare')
 end)
 
 test:test("strip", function(test)
diff --git a/test/box/ddl.result b/test/box/ddl.result
index 87b9581c6..a5e3d7206 100644
--- a/test/box/ddl.result
+++ b/test/box/ddl.result
@@ -500,6 +500,21 @@ box.space._collation.index.name:delete{'test'}
 - [3, 'test', 0, 'ICU', 'ru_RU', {}]
 ...
 --
+-- gh-3290: expose ICU into Lua. It uses built-in collations, that
+-- must work even if a collation is deleted from _collation.
+--
+t = box.space._collation:delete{1}
+---
+...
+string.u_compare('abc', 'def')
+---
+- -1
+...
+box.space._collation:replace(t)
+---
+- [1, 'unicode', 1, 'ICU', '', {}]
+...
+--
 -- gh-2839: allow to store custom fields in field definition.
 --
 format = {}
diff --git a/test/box/ddl.test.lua b/test/box/ddl.test.lua
index a1502ae13..9e4577069 100644
--- a/test/box/ddl.test.lua
+++ b/test/box/ddl.test.lua
@@ -191,6 +191,14 @@ test_run:cmd('restart server default')
 box.space._collation:select{}
 box.space._collation.index.name:delete{'test'}
 
+--
+-- gh-3290: expose ICU into Lua. It uses built-in collations, that
+-- must work even if a collation is deleted from _collation.
+--
+t = box.space._collation:delete{1}
+string.u_compare('abc', 'def')
+box.space._collation:replace(t)
+
 --
 -- gh-2839: allow to store custom fields in field definition.
 --
-- 
2.15.1 (Apple Git-101)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] Re: [PATCH 2/7] lua: implement string.u_count
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 2/7] lua: implement string.u_count Vladislav Shpilevoy
@ 2018-04-26 10:36   ` Vladislav Shpilevoy
  2018-04-26 16:07   ` Vladislav Shpilevoy
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-26 10:36 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja, Alexander Turenko

On the branch I added two new options - 'title' and 'letter'. Unicode
has symbols, that are neither upper or lower - they are title. It is
two-char symbols like Dž.

'letter' is just OR of 'upper', 'lower' and 'title'.

On 26/04/2018 02:29, Vladislav Shpilevoy wrote:
> Lua can not calculate length of a unicode string correctly. But
> Tarantool has ICU on board - lets use it to calculate length.
> 
> u_count has options, that allows to count only symbols of a
> specific class, for example, only capital letters, or digits.
> Options can be combined.
> 
> Closes #3081
> ---
>   extra/exports                |  1 +
>   src/CMakeLists.txt           |  1 +
>   src/lua/string.lua           | 52 ++++++++++++++++++++++++++++++++++++++++++++
>   src/util.c                   | 48 +++++++++++++++++++++++++++++++++++++++-
>   test/app-tap/string.test.lua | 22 ++++++++++++++++++-
>   5 files changed, 122 insertions(+), 2 deletions(-)
> 
> diff --git a/extra/exports b/extra/exports
> index a274bb23b..b0480fe79 100644
> --- a/extra/exports
> +++ b/extra/exports
> @@ -40,6 +40,7 @@ title_set_status
>   title_get_status
>   exception_get_string
>   exception_get_int
> +u_count
>   
>   tarantool_lua_ibuf
>   uuid_nil
> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
> index 8ab09e968..f489c88cf 100644
> --- a/src/CMakeLists.txt
> +++ b/src/CMakeLists.txt
> @@ -110,6 +110,7 @@ target_link_libraries(core
>       ${LIBEIO_LIBRARIES}
>       ${LIBCORO_LIBRARIES}
>       ${MSGPUCK_LIBRARIES}
> +    ${ICU_LIBRARIES}
>   )
>   
>   add_library(stat STATIC rmean.c latency.c histogram.c)
> diff --git a/src/lua/string.lua b/src/lua/string.lua
> index 1c7226143..6c566cb54 100644
> --- a/src/lua/string.lua
> +++ b/src/lua/string.lua
> @@ -29,6 +29,9 @@ ffi.cdef[[
>   
>       const char *
>       u_errorName(UErrorCode code);
> +
> +    int
> +    u_count(const char *s, int bsize, uint8_t flags);
>   ]]
>   
>   local c_char_ptr = ffi.typeof('const char *')
> @@ -452,6 +455,54 @@ local function string_u_lower(inp, opts)
>       return string_u_to_case_impl(inp, opts, ffi.C.ucasemap_utf8ToLower, usage)
>   end
>   
> +local U_COUNT_CLASS_ALL = 0
> +local U_COUNT_CLASS_UPPER_LETTER = 1
> +local U_COUNT_CLASS_LOWER_LETTER = 2
> +local U_COUNT_CLASS_DIGIT = 4
> +
> +--
> +-- Calculate count of symbols matching the needed classes.
> +-- @param inp Input UTF8 string.
> +-- @param opts Options with needed classes. It supports 'all',
> +--        'upper', 'lower', 'digit'. Opts is a table, where needed
> +--        class key is set to true. By default all classes are
> +--        needed, and count works like strlen (not bsize, like Lua
> +--        operator '#').
> +-- @retval not nil Summary count of needed symbols.
> +-- @retval nil, position Invalid UTF8 on returned position.
> +--
> +local function string_u_count(inp, opts)
> +    local usage = 'Usage: string.u_count(str)'
> +    if type(inp) ~= 'string' then
> +        error(usage)
> +    end
> +    local flags = 0
> +    if opts then
> +        if type(opts) ~= 'table' then
> +            error(usage)
> +        end
> +        if not opts.all then
> +            if opts.upper then
> +                flags = bit.bor(flags, U_COUNT_CLASS_UPPER_LETTER)
> +            end
> +            if opts.lower then
> +                flags = bit.bor(flags, U_COUNT_CLASS_LOWER_LETTER)
> +            end
> +            if opts.digit then
> +                flags = bit.bor(flags, U_COUNT_CLASS_DIGIT)
> +            end
> +        end
> +    end
> +    local len = #inp
> +    inp = c_char_ptr(inp)
> +    local ret = ffi.C.u_count(inp, len, flags)
> +    if ret >= 0 then
> +        return ret
> +    else
> +        return nil, -ret
> +    end
> +end
> +
>   -- It'll automatically set string methods, too.
>   local string = require('string')
>   string.split      = string_split
> @@ -466,3 +517,4 @@ string.lstrip      = string_lstrip
>   string.rstrip      = string_rstrip
>   string.u_upper    = string_u_upper
>   string.u_lower    = string_u_lower
> +string.u_count    = string_u_count
> diff --git a/src/util.c b/src/util.c
> index 9458695b9..c117dee05 100644
> --- a/src/util.c
> +++ b/src/util.c
> @@ -40,7 +40,8 @@
>   #include <time.h>
>   #include <unistd.h>
>   #include <limits.h>
> -
> +#include <unicode/utf8.h>
> +#include <unicode/uchar.h>
>   #include <msgpuck/msgpuck.h> /* mp_char2escape[] table */
>   
>   #include "say.h"
> @@ -321,3 +322,48 @@ fpconv_check()
>   	 */
>   	assert(buf[1] == '.');
>   }
> +
> +enum u_count_class {
> +	U_COUNT_CLASS_ALL = 0,
> +	U_COUNT_CLASS_UPPER_LETTER = 1,
> +	U_COUNT_CLASS_LOWER_LETTER = 2,
> +	U_COUNT_CLASS_DIGIT = 4,
> +};
> +
> +/**
> + * Get length of a UTF8 string.
> + * @param s UTF8 string.
> + * @param bsize Binary size of @an s.
> + * @param flags Binary OR of u_count_class flags.
> + * @retval >=0 Count of symbols matched one of @a flags.
> + * @retval  <0 Invalid UTF8 on the position -1 * returned value.
> + */
> +int
> +u_count(const char *s, int bsize, uint8_t flags)
> +{
> +	int offset = 0;
> +	int len = 0;
> +	UChar32 c;
> +	if (flags == 0) {
> +		/* Fast path - just calculate strlen. */
> +		while (offset < bsize) {
> +			U8_NEXT(s, offset, bsize, c);
> +			if (c == U_SENTINEL)
> +				return -(len + 1);
> +			++len;
> +		}
> +		return len;
> +	}
> +	/* Slow path - must check each symbol to match flags. */
> +	while (offset < bsize) {
> +		U8_NEXT(s, offset, bsize, c);
> +		if (c == U_SENTINEL)
> +			return -(len + 1);
> +		uint8_t f = 0;
> +		f |= (flags & U_COUNT_CLASS_UPPER_LETTER) != 0 && u_isupper(c);
> +		f |= (flags & U_COUNT_CLASS_LOWER_LETTER) != 0 && u_islower(c);
> +		f |= (flags & U_COUNT_CLASS_DIGIT) != 0 && u_isdigit(c);
> +		len += f != 0 ? 1 : 0;
> +	}
> +	return len;
> +}
> diff --git a/test/app-tap/string.test.lua b/test/app-tap/string.test.lua
> index 004e149e9..650a5982d 100755
> --- a/test/app-tap/string.test.lua
> +++ b/test/app-tap/string.test.lua
> @@ -115,7 +115,7 @@ test:test("hex", function(test)
>   end)
>   
>   test:test("unicode", function(test)
> -    test:plan(12)
> +    test:plan(24)
>       local str = 'хеЛлоу вОрЛд ё Ё я Я э Э ъ Ъ hElLo WorLd 1234 i I İ 勺#☢༺'
>       local upper_res = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 I I İ 勺#☢༺'
>       local upper_turkish = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 İ I İ 勺#☢༺'
> @@ -144,6 +144,26 @@ test:test("unicode", function(test)
>       s, err = string.u_lower(str, {locale = 'not_existing locale tratatatata'})
>       test:is(s, lower_res, 'incorrect locale turns into default lower')
>       test:isnil(err, 'lower error is nil')
> +
> +    -- Test u_count.
> +    test:is(string.u_count(str), 56, 'u_count works')
> +    s, err = string.u_count("\xE2\x80\xE2")
> +    test:is(err, 1, 'u_count checks for errors')
> +    test:isnil(s, 'retval is nil on error')
> +    test:is(string.u_count(''), 0, 'u_count works on empty strings')
> +    s, err = pcall(string.u_count, 100)
> +    test:isnt(err:find('Usage'), nil, 'usage is checked')
> +    -- Test different symbol classes.
> +    s, err = pcall(string.u_count, str, 1234)
> +    test:isnt(err:find('Usage'), nil, 'usage checks options')
> +    test:is(string.u_count(str, {all = true}), 56, 'option all')
> +    test:is(string.u_count(str, {upper = true}), 13, 'option upper')
> +    test:is(string.u_count(str, {lower = true}), 19, 'option lower')
> +    test:is(string.u_count(str, {upper = true, lower = true}), 32,
> +            'options upper and lower')
> +    test:is(string.u_count(str, {digit = true}), 4, 'option digit')
> +    test:is(string.u_count(str, {digit = true, upper = true}), 17,
> +            'options digit and upper')
>   end)
>   
>   test:test("strip", function(test)
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] Re: [PATCH 2/7] lua: implement string.u_count
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 2/7] lua: implement string.u_count Vladislav Shpilevoy
  2018-04-26 10:36   ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-04-26 16:07   ` Vladislav Shpilevoy
  2018-04-26 23:57   ` Vladislav Shpilevoy
  2018-04-28  1:10   ` Alexander Turenko
  3 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-26 16:07 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Removed unused enum value.

diff --git a/src/util.c b/src/util.c
index 24dfe11ce..582fd69b0 100644
--- a/src/util.c
+++ b/src/util.c
@@ -329,10 +329,6 @@ enum u_count_class {
         U_COUNT_CLASS_LOWER_LETTER = 2,
         U_COUNT_CLASS_TITLE_LETTER = 4,
         U_COUNT_CLASS_DIGIT = 8,
-
-       U_COUNT_LETTER = U_COUNT_CLASS_UPPER_LETTER |
-                        U_COUNT_CLASS_LOWER_LETTER |
-                        U_COUNT_CLASS_TITLE_LETTER,
  };

On 26/04/2018 02:29, Vladislav Shpilevoy wrote:
> Lua can not calculate length of a unicode string correctly. But
> Tarantool has ICU on board - lets use it to calculate length.
> 
> u_count has options, that allows to count only symbols of a
> specific class, for example, only capital letters, or digits.
> Options can be combined.
> 
> Closes #3081
> ---
>   extra/exports                |  1 +
>   src/CMakeLists.txt           |  1 +
>   src/lua/string.lua           | 52 ++++++++++++++++++++++++++++++++++++++++++++
>   src/util.c                   | 48 +++++++++++++++++++++++++++++++++++++++-
>   test/app-tap/string.test.lua | 22 ++++++++++++++++++-
>   5 files changed, 122 insertions(+), 2 deletions(-)
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] Re: [PATCH 2/7] lua: implement string.u_count
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 2/7] lua: implement string.u_count Vladislav Shpilevoy
  2018-04-26 10:36   ` [tarantool-patches] " Vladislav Shpilevoy
  2018-04-26 16:07   ` Vladislav Shpilevoy
@ 2018-04-26 23:57   ` Vladislav Shpilevoy
  2018-04-28  1:10   ` Alexander Turenko
  3 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-26 23:57 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Review fixes after discussion with Alexander.

Remove TITLE option, and introduce a separate LETTER option.

It is needed because Unicode has more letter classes, than
upper/lower/title, but even title is not needed in our API. Lets
just check u_isalpha(), when a letter is needed, and remove title.


diff --git a/src/lua/string.lua b/src/lua/string.lua
index 8e3935963..2b6f5b3d9 100644
--- a/src/lua/string.lua
+++ b/src/lua/string.lua
@@ -464,18 +464,14 @@ end
  local U_COUNT_CLASS_ALL = 0
  local U_COUNT_CLASS_UPPER_LETTER = 1
  local U_COUNT_CLASS_LOWER_LETTER = 2
-local U_COUNT_CLASS_TITLE_LETTER = 4
+local U_COUNT_CLASS_LETTER = 4
  local U_COUNT_CLASS_DIGIT = 8
  
-local U_COUNT_LETTER = bit.bor(U_COUNT_CLASS_UPPER_LETTER,
-                               U_COUNT_CLASS_LOWER_LETTER,
-                               U_COUNT_CLASS_TITLE_LETTER)
-
  --
  -- Calculate count of symbols matching the needed classes.
  -- @param inp Input UTF8 string.
  -- @param opts Options with needed classes. It supports 'all',
---        'upper', 'lower', 'title', 'digit'. Opts is a table,
+--        'upper', 'lower', 'letter', 'digit'. Opts is a table,
  --        where needed class key is set to true. By default all
  --        classes are needed, and count works like strlen (not
  --        bsize, like Lua operator '#').
@@ -500,11 +496,8 @@ local function string_u_count(inp, opts)
                  if opts.lower then
                      flags = bit.bor(flags, U_COUNT_CLASS_LOWER_LETTER)
                  end
-                if opts.title then
-                    flags = bit.bor(flags, U_COUNT_CLASS_TITLE_LETTER)
-                end
              else
-                flags = bit.bor(flags, U_COUNT_LETTER)
+                flags = bit.bor(flags, U_COUNT_CLASS_LETTER)
              end
              if opts.digit then
                  flags = bit.bor(flags, U_COUNT_CLASS_DIGIT)
diff --git a/src/util.c b/src/util.c
index a7a1a35ac..c9eae25f8 100644
--- a/src/util.c
+++ b/src/util.c
@@ -328,7 +328,7 @@ enum u_count_class {
  	U_COUNT_CLASS_ALL = 0,
  	U_COUNT_CLASS_UPPER_LETTER = 1,
  	U_COUNT_CLASS_LOWER_LETTER = 2,
-	U_COUNT_CLASS_TITLE_LETTER = 4,
+	U_COUNT_CLASS_LETTER = 4,
  	U_COUNT_CLASS_DIGIT = 8,
  };
  
@@ -364,7 +364,7 @@ u_count(const char *s, int bsize, uint8_t flags)
  		uint8_t f = 0;
  		f |= (flags & U_COUNT_CLASS_UPPER_LETTER) != 0 && u_isupper(c);
  		f |= (flags & U_COUNT_CLASS_LOWER_LETTER) != 0 && u_islower(c);
-		f |= (flags & U_COUNT_CLASS_TITLE_LETTER) != 0 && u_istitle(c);
+		f |= (flags & U_COUNT_CLASS_LETTER) != 0 && u_isalpha(c);
  		f |= (flags & U_COUNT_CLASS_DIGIT) != 0 && u_isdigit(c);
  		len += f != 0 ? 1 : 0;
  	}
diff --git a/test/app-tap/string.test.lua b/test/app-tap/string.test.lua
index 1b154298f..bbec0c974 100755
--- a/test/app-tap/string.test.lua
+++ b/test/app-tap/string.test.lua
@@ -165,10 +165,12 @@ test:test("unicode", function(test)
      test:is(string.u_count(str, {digit = true}), 4, 'option digit')
      test:is(string.u_count(str, {digit = true, upper = true}), 17,
              'options digit and upper')
-    test:is(string.u_count('Dž', {title = true}), 1, 'option title')
-    test:is(string.u_count('Dž', {upper = true, lower = true}), 0,
-                           'title is not the same as upper or lower')
-    test:is(string.u_count(str..'Dž', {letter = true}), 33, 'option letter')
+    test:is(string.u_count('꜁Dž', {letter = true}), 1,
+            'option letter for title and modifier symbols')
+    test:is(string.u_count('勺', {letter = true}), 1,
+            'option letter for non-case symbols')
+    test:is(string.u_count('勺', {upper = true, lower = true}), 0,
+                           'non-case symbols are not visible for upper/lower')
      -- Test compare.
      local s1 = '☢'
      local s2 = 'İ'

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] Re: [PATCH 1/7] lua: expose ICU upper/lower functions to Lua
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 1/7] lua: expose ICU upper/lower functions to Lua Vladislav Shpilevoy
@ 2018-04-28  0:56   ` Alexander Turenko
  0 siblings, 0 replies; 14+ messages in thread
From: Alexander Turenko @ 2018-04-28  0:56 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: kostja, tarantool-patches

Hi Vlad!

Please, consider comments below.

WBR, Alexander Turenko.

On Thu, Apr 26, 2018 at 02:29:01AM +0300, Vladislav Shpilevoy wrote:
> Lua can not work with unicode - in Lua it is enterpreted as a
> binary. On such string built-in upper/lower functions do not
> work. But Tarantool links with ICU that can solve the problem.
> Lets expose ICU upper/lower function into Lua to enable correct
> case transformations.
> 
> Closes #3290
> ---
>  src/lua/init.c               |   2 +-
>  src/lua/string.lua           | 140 +++++++++++++++++++++++++++++++++++++++++++
>  test/app-tap/string.test.lua |  34 ++++++++++-
>  3 files changed, 174 insertions(+), 2 deletions(-)
> 
> <...>
> diff --git a/src/lua/string.lua b/src/lua/string.lua
> index 5ff64c9f6..1c7226143 100644
> --- a/src/lua/string.lua
> +++ b/src/lua/string.lua
> <...>
> +--
> +-- Find cached UCaseMap for @a locale, or create a new one and
> +-- cache it.
> +-- @param locale String locale or box.NULL for default.
> +-- @retval nil Can neither get or create a UCaseMap.
> +-- @retval not nil Needed UCaseMap.
> +--

There are two special values for a locale: NULL is a 'default locale'
means system-wide one (LANG=...) and "" (empty string) means 'root'
locale that is some kind of compromise for language/country neural
applications.

Two things should be considered here:

1. Want we use system-default locale in tarantool by default? A system
   locale which is dependent of environment and can be different on a
   developer system and on a staging/production system. Tarantool is not
   end-user application, but some kind of system software.

2. We should desribe both special values in doxygen-style comment (and,
   then, in documentation) and explicitly state which one is the default
   (and why).

Side note: it seems that box.NULL will be returned in case or an error,
but not nil.

> +local function ucasemap_retrieve(locale)
> +    local ret = ucasemap_cache[locale]
> +    if not ret then
> +        ret = ffi.C.ucasemap_open(c_char_ptr(locale), 0, errcode)
> +        if ret ~= nil then
> +            ffi.gc(ret, ffi.C.ucasemap_close)
> +            ucasemap_cache[locale] = ret
> +        end
> +    end
> +    return ret
> +end

Root locale will be returned in case of non-existent locale (please note
that it can differs from 'default', i.e. system locale). Proposed to at
least document that behaviour. But maybe it would be better to give
explicit error in the case.

> +--
> +-- Check ICU options for string.u_upper/u_lower.
> +-- @param opts Options. Can contain only one option - locale.
> +-- @param usage_err What to throw if opts types are violated.
> +-- @retval String locale if found.
> +-- @retval box.NULL if locale is not found.
> +--

Proposed wording:

> box.NULL if 'locale' field is not found.

To clarify that no lookup for available locales is performed.

> +local function icu_check_case_opts(opts, usage_err)
> <...>
> diff --git a/test/app-tap/string.test.lua b/test/app-tap/string.test.lua
> index 852a7923c..004e149e9 100755
> --- a/test/app-tap/string.test.lua
> +++ b/test/app-tap/string.test.lua
> <...>
> +test:test("unicode", function(test)
> +    test:plan(12)
> +    local str = 'хеЛлоу вОрЛд ё Ё я Я э Э ъ Ъ hElLo WorLd 1234 i I İ 勺#☢༺'

Your implementation correctly handle the empty string. Proposed to add
corresponsing test case.

> +    local upper_res = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 I I İ 勺#☢༺'
> +    local upper_turkish = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 İ I İ 勺#☢༺'
> +    local lower_res = 'хеллоу ворлд ё ё я я э э ъ ъ hello world 1234 i i i̇ 勺#☢༺'
> +    local lower_turkish = 'хеллоу ворлд ё ё я я э э ъ ъ hello world 1234 i ı i 勺#☢༺'
> +    local s = string.u_upper(str)
> +    test:is(s, upper_res, 'default locale upper')

The test is system-locale dependent (you can try it like `LANG=tr_TR
./src/tarantool ./test/app-tap/string.test.lua`). Proposed to set LANG
explicitly in the test.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] Re: [PATCH 2/7] lua: implement string.u_count
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 2/7] lua: implement string.u_count Vladislav Shpilevoy
                     ` (2 preceding siblings ...)
  2018-04-26 23:57   ` Vladislav Shpilevoy
@ 2018-04-28  1:10   ` Alexander Turenko
  3 siblings, 0 replies; 14+ messages in thread
From: Alexander Turenko @ 2018-04-28  1:10 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: kostja, tarantool-patches

Just one tiny comment below.

WBR, Alexander Turenko.

On Thu, Apr 26, 2018 at 02:29:02AM +0300, Vladislav Shpilevoy wrote:
> Lua can not calculate length of a unicode string correctly. But
> Tarantool has ICU on board - lets use it to calculate length.
> 
> u_count has options, that allows to count only symbols of a
> specific class, for example, only capital letters, or digits.
> Options can be combined.
> 
> Closes #3081
> ---
>  extra/exports                |  1 +
>  src/CMakeLists.txt           |  1 +
>  src/lua/string.lua           | 52 ++++++++++++++++++++++++++++++++++++++++++++
>  src/util.c                   | 48 +++++++++++++++++++++++++++++++++++++++-
>  test/app-tap/string.test.lua | 22 ++++++++++++++++++-
>  5 files changed, 122 insertions(+), 2 deletions(-)
> 
> <...>
> diff --git a/src/util.c b/src/util.c
> index 9458695b9..c117dee05 100644
> --- a/src/util.c
> +++ b/src/util.c
> <...>
> +/**
> + * Get length of a UTF8 string.
> + * @param s UTF8 string.
> + * @param bsize Binary size of @an s.

Whether it worth to clarify that it is w/o trailing '\0'?

> + * @param flags Binary OR of u_count_class flags.
> + * @retval >=0 Count of symbols matched one of @a flags.
> + * @retval  <0 Invalid UTF8 on the position -1 * returned value.
> + */
> +int
> +u_count(const char *s, int bsize, uint8_t flags)
> +{

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tarantool-patches] Re: [PATCH 0/7] Expose ICU into Lua
  2018-04-25 23:29 [tarantool-patches] [PATCH 0/7] Expose ICU into Lua Vladislav Shpilevoy
                   ` (6 preceding siblings ...)
  2018-04-25 23:29 ` [tarantool-patches] [PATCH 7/7] lua: expose u_compare/u_icompare into Lua Vladislav Shpilevoy
@ 2018-04-28  1:55 ` Alexander Turenko
  7 siblings, 0 replies; 14+ messages in thread
From: Alexander Turenko @ 2018-04-28  1:55 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: kostja, tarantool-patches

Hi Vlad!

Some thoughts / questions re API are below.

1. u_upper/u_lower have support of user-provided locales and follow
   system-default locale by default (that decision is debatable IMHO,
   see the email re 1st patch). But u_compare/u_icompare uses built-in
   collations unconditionally. Should all these functions have some
   unified approach to handle locales?

2. Should we expose these functions into the 'string' module? The module
   seems to be very basic for the language and maybe it worth to be
   conservative in its extending. Lua 5.3 have separate 'utf8' module,
   for example.

3. Should we stick to some existing API to be more friendly for existing
   users?

The examples I found:

- lua 5.3 utf8: https://www.lua.org/manual/5.3/manual.html#6.5 
- lua-utf8: https://github.com/starwing/luautf8
- icu-lua: http://files.luaforge.net/releases/icu-lua/icu-lua/0.1A

From the other side, they seems to don't have character properties
exposing like in our u_count and don't provide ability to set specific
locale. So trying to provide transparent replacement for some parts of
these APIs seems not being a good idea. Just note this possible point
here.

WBR, Alexander Turenko.

On Thu, Apr 26, 2018 at 02:29:00AM +0300, Vladislav Shpilevoy wrote:
> Branch: http://github.com/tarantool/tarantool/tree/gh-3290-lua-icu-ucasemap
> Issue: https://github.com/tarantool/tarantool/issues/3290
> Issue: https://github.com/tarantool/tarantool/issues/3081
> 
> Lua can not work with unicode - in Lua it is enterpreted as a binary. On such
> string built-in upper/lower functions, '#' (length) and comparison operators do
> not work. But Tarantool links with ICU and has comparators with collations, that
> can solve the problems.
> 
> But there is another issue - string methods must be available before box.cfg,
> so the ICU and collations must be built out of main 'box' static library. To do
> this the collation related files are moved from 'box' into 'core' library.
> 
> A second issue is that when box.cfg is called, it inserts built-in collations
> into _collation space, and these insertions can conflict with built-in
> collations, created before box.cfg. Delete from _collation can break the
> collations cache as well. The patchset solves this by checking collations
> deletions and insertions, and if they tries to operate on built-in collations,
> then they are ignored - a user sees changes in _collation, but the cache is
> unchanged.
> 
> Vladislav Shpilevoy (7):
>   lua: expose ICU upper/lower functions to Lua
>   lua: implement string.u_count
>   alter: fix assertion in collations alter
>   Move struct on_access_denied_ctx into error.h
>   Merge box_error, stat and collations into core library
>   Always store built-in collations in the cache
>   lua: expose u_compare/u_icompare into Lua
> 
> <...>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-04-28  1:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 23:29 [tarantool-patches] [PATCH 0/7] Expose ICU into Lua Vladislav Shpilevoy
2018-04-25 23:29 ` [tarantool-patches] [PATCH 1/7] lua: expose ICU upper/lower functions to Lua Vladislav Shpilevoy
2018-04-28  0:56   ` [tarantool-patches] " Alexander Turenko
2018-04-25 23:29 ` [tarantool-patches] [PATCH 2/7] lua: implement string.u_count Vladislav Shpilevoy
2018-04-26 10:36   ` [tarantool-patches] " Vladislav Shpilevoy
2018-04-26 16:07   ` Vladislav Shpilevoy
2018-04-26 23:57   ` Vladislav Shpilevoy
2018-04-28  1:10   ` Alexander Turenko
2018-04-25 23:29 ` [tarantool-patches] [PATCH 3/7] alter: fix assertion in collations alter Vladislav Shpilevoy
2018-04-25 23:29 ` [tarantool-patches] [PATCH 4/7] Move struct on_access_denied_ctx into error.h Vladislav Shpilevoy
2018-04-25 23:29 ` [tarantool-patches] [PATCH 5/7] Merge box_error, stat and collations into core library Vladislav Shpilevoy
2018-04-25 23:29 ` [tarantool-patches] [PATCH 6/7] Always store built-in collations in the cache Vladislav Shpilevoy
2018-04-25 23:29 ` [tarantool-patches] [PATCH 7/7] lua: expose u_compare/u_icompare into Lua Vladislav Shpilevoy
2018-04-28  1:55 ` [tarantool-patches] Re: [PATCH 0/7] Expose ICU " Alexander Turenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox