From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 9CF8A23C04 for ; Tue, 15 May 2018 15:54:10 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k-19XYVhaGAO for ; Tue, 15 May 2018 15:54:10 -0400 (EDT) Received: from smtp54.i.mail.ru (smtp54.i.mail.ru [217.69.128.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 55C29236C6 for ; Tue, 15 May 2018 15:54:10 -0400 (EDT) From: Vladislav Shpilevoy Subject: [tarantool-patches] [PATCH v3 0/4] Lua utf8 module Date: Tue, 15 May 2018 22:54:04 +0300 Message-Id: Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org Cc: kostja@tarantool.org Branch: http://github.com/tarantool/tarantool/tree/gh-3290-lua-icu Issue: https://github.com/tarantool/tarantool/issues/3290 Issue: https://github.com/tarantool/tarantool/issues/3385 Issue: https://github.com/tarantool/tarantool/issues/3081 Lua can not work with unicode - in Lua it is enterpreted as a binary. On such string built-in upper/lower functions, '#' (length) and comparison operators do not work. But Tarantool links with ICU and has comparators with collations, that can solve the problems. But there is another issue - string methods must be available before box.cfg, so the ICU and collations must be built out of main 'box' static library. To do this the collations are splitted in two submodules: core and box collations. Core collation does not have any user defined things like name, id, owner - it is just a comparator, wrapper for UCollator. Box collation is a wrapper for core one, and has name, owner and id. Core collations can be used before box.cfg, and have a cache built on fingerprints. Fingeprint is a string that completely describes a collation behavior. Core collations are never duplicated - if a collation is requested with a fingerprint, that already exists in the cache, then the existing collation is returned and referenced. Vladislav Shpilevoy (4): error: introduce error rebulding API collation: split collation into core and box objects collation: introduce collation fingerprint lua: introduce utf8 built-in globaly visible module src/CMakeLists.txt | 5 +- src/box/alter.cc | 72 +++---- src/box/coll.c | 247 ++-------------------- src/box/coll.h | 59 ++---- src/box/coll_cache.c | 44 ++-- src/box/coll_cache.h | 17 +- src/box/coll_def.c | 32 --- src/box/coll_def.h | 86 +------- src/box/error.cc | 27 +++ src/box/error.h | 5 + src/box/key_def.cc | 22 +- src/box/key_def.h | 5 +- src/box/lua/space.cc | 8 +- src/box/schema.cc | 8 +- src/box/tuple.c | 4 +- src/box/tuple_compare.cc | 5 +- src/box/tuple_hash.cc | 4 +- src/coll.c | 352 +++++++++++++++++++++++++++++++ src/coll.h | 113 ++++++++++ src/coll_def.c | 63 ++++++ src/coll_def.h | 115 +++++++++++ src/diag.h | 9 + src/lua/init.c | 3 + src/lua/utf8.c | 479 +++++++++++++++++++++++++++++++++++++++++++ src/lua/utf8.h | 42 ++++ src/main.cc | 3 + test/app-tap/string.test.lua | 163 ++++++++++++++- test/box/ddl.result | 15 ++ test/box/ddl.test.lua | 8 + test/unit/CMakeLists.txt | 2 +- test/unit/coll.cpp | 47 ++++- test/unit/coll.result | 5 + 32 files changed, 1582 insertions(+), 487 deletions(-) create mode 100644 src/coll.c create mode 100644 src/coll.h create mode 100644 src/coll_def.c create mode 100644 src/coll_def.h create mode 100644 src/lua/utf8.c create mode 100644 src/lua/utf8.h -- 2.15.1 (Apple Git-101)