From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id AB43571224; Fri, 29 Oct 2021 01:11:48 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org AB43571224 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1635459108; bh=mhCkMa3gnPOCMXKpBbEtQERLMRJ/LN9tDtHg9WluCg0=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=Y7x9l/cuOvLHswv82QMsMABVDz2mfiyT+LtGwj36NMdhzp7lynPQ/Fh/zeGXbKz4+ K5r3m+WRZdoW02f7ekCB2n1Z4zxc/4kFXp4UhBILUfG+kkyAkSmiqUApsm7BVzWKNA bJGjCwjM6pZbcgUL/frfId/hZ0k9hpejlBCGK2LI= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 380BA70368 for ; Fri, 29 Oct 2021 01:11:39 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 380BA70368 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1mgDcY-00008l-Cb; Fri, 29 Oct 2021 01:11:38 +0300 Message-ID: Date: Fri, 29 Oct 2021 00:11:37 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.2.1 Content-Language: en-US To: Mergen Imeev Cc: tarantool-patches@dev.tarantool.org References: <9cc35ba4625d4e3017725c35fbc4a7ed90341917.1633105483.git.imeevma@gmail.com> <20211020165844.GB203963@tarantool.org> In-Reply-To: <20211020165844.GB203963@tarantool.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-4EC0790: 10 X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9E6B4260954843F6F3E786F33575CCF95BE72D31FADAB1DEE00894C459B0CD1B95BAF2298D0126D5CA6CCBB539B09691808357E0F7F63E000A960087087731FFE X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE71BDE6A359BD5B800EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006376EC5B14D896A2D978638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8CA27A74E57E73AC5480EDCEF8A22DD64117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18C26CFBAC0749D213D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6CF11E2829993B7FCEFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A213B5FB47DCBC3458834459D11680B50569982430092E461C82A6D6EF5FA6EA87 X-C1DE0DAB: 0D63561A33F958A5A8B1023AE7E3C2B372CA2680342E360F3AD65418D7647769D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75C29D03FC76C37677410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D340FE9920E7E3E5C6237808831D8314B00FE2542C82A4F1A427FDE1D577B7FD70BA715FB35C61EA7031D7E09C32AA3244CCD63DD553ACA27FCFC1E1056131FF8325A1673A01BA68E40729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojdMRfVmNkPDjY+7eFLjnGCA== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5D5C520959E454056A88CB61714E14C4F53841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Thanks for the fixes! >>> +/** Implementation of the CHAR_LENGTH() function. */ >>> +static inline uint8_t >>> +utf8_len_char(char c) >>> +{ >>> + uint8_t u = (uint8_t)c; >>> + return 1 + (u >= 0xc2) + (u >= 0xe0) + (u >= 0xf0); >> >> It is not that simple really. Consider either using the old >> lengthFunc() and other sqlite utf8 helpers or use the approach >> similar to utf8_len() in utf8.c. It uses ICU macro U8_NEXT() >> and has handling for special symbols like U_SENTINEL. >> >> Otherwise you are making already third version of functions to >> work with utf8. >> >> I would even prefer to refactor lengthFunc() to stop using sqlite >> legacy and drop sqlite utf8 entirely, but I suspect it might be >> not so trivial to do and should be done later. > I was able to use ucnv_getNextUChar() here. In fact, I was able to use this > functions in all the places in this patch-set where we had to work with my or > SQLite functions that work with UTF8 characters. I think I can remove sql/utf.c > in the next patchset, since I refactor the LENGTH() and UNICODE() functions > there. Discussed in private that U8_NEXT() would work here just fine. ucnv_getNextUChar() is an overkill. In other places of the patchset too.