From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 33C8726786 for ; Wed, 20 Feb 2019 13:08:05 -0500 (EST) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Vw-2YEEjk38i for ; Wed, 20 Feb 2019 13:08:05 -0500 (EST) Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id E2BDE26774 for ; Wed, 20 Feb 2019 13:08:04 -0500 (EST) Subject: [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' References: <15e143f4-3ea7-c7d6-d8ac-8a0e20b76449@tarantool.org> <1560FF96-FECD-4368-8AF8-F8F2AE7696E3@tarantool.org> <07DBA796-6DD4-41DD-8438-104FE3AE05BB@tarantool.org> <4F4E0A7E-199C-4647-A49C-DD0E8A216527@tarantool.org> <8EF5CE57-C6B5-493C-94CC-AA3C88639485@tarantool.org> <7E6CE8AA-512D-4472-9DBD-8159073386C5@tarantool.org> <25649276-74CD-46E7-A1EB-F4CE299E637C@tarantool.org> <427EE913-3E58-413F-A645-DBF83C809334@tarantool.org> From: Vladislav Shpilevoy Message-ID: <76061505-5fd7-3429-d807-3f05c80024df@tarantool.org> Date: Wed, 20 Feb 2019 21:08:01 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org, "n.pettik" Cc: Ivan Koptelov On 20/02/2019 19:04, n.pettik wrote: > > >> On 20 Feb 2019, at 18:47, i.koptelov wrote: >> >> Thanks to Alexander, I fixed my patch to use a function >> from icu to count the length of the string. >> >> Changes: >> > > Look, each next implementation again and again changes > results of certain tests. Lets firstly define exact behaviour of > length() function and then write function which will satisfy these > requirements, not vice versa. Is this the final version? > Moreover, since Konstantin suggest as fast implementation > as we can, I propose to consider sort of asm written variant: > > .global ap_strlen_utf8_s > ap_strlen_utf8_s: > push %esi > cld > mov 8(%esp), %esi > xor %ecx, %ecx > loopa: dec %ecx > loopb: lodsb > shl $1, %al > js loopa > jc loopb > jnz loopa > mov %ecx, %eax > not %eax > pop %esi > ret > > > It is taken from http://canonical.org/~kragen/strlen-utf8 > and author claims that quite fast (seems like it doesn’t > handle \0, but we can patch it). I didn’t bench it, so I am > not absolutely sure that it ‘way faster’ than other implementations. https://github.com/Gerold103/tarantool-memes/blob/master/further%20from%20god.jpg