From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 98AFA264CA for ; Fri, 17 Aug 2018 07:42:51 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Kti9O0QKBj40 for ; Fri, 17 Aug 2018 07:42:51 -0400 (EDT) Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 56E42263F2 for ; Fri, 17 Aug 2018 07:42:51 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue References: <43febf82af3702fadfea135db978ffb6426eb00d.1534436836.git.n.tatunov@tarantool.org> <20180817111727.y6nsbblpm5nh4n3g@tkn_work_nb> From: Alex Khatskevich Message-ID: <436d256a-f9d0-781f-8cad-179d7322c7bd@tarantool.org> Date: Fri, 17 Aug 2018 14:42:47 +0300 MIME-Version: 1.0 In-Reply-To: <20180817111727.y6nsbblpm5nh4n3g@tkn_work_nb> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: Alexander Turenko Cc: "N.Tatunov" , tarantool-patches@freelists.org, "N.Tatunov" On 17.08.2018 14:17, Alexander Turenko wrote: > 0xffff is the result of 'end of a string' check as well as internal buffer > overflow error. I have the relevant code pasted in the first review of > the patch (July, 18). > > // source/common/ucnv.c::ucnv_getNextUChar > 1860 s=*source; > 1861 if(sourceLimit 1862 *err=U_ILLEGAL_ARGUMENT_ERROR; > 1863 return 0xffff; > 1864 } > > We should not handle the buffer overflow case as an invalid symbol. Of > course we should not handle it as the 'end of the string' situation. > Ideally we should perform pointer myself and raise an error in case of > 0xffff. I had thought that a buffer overflow error is unlikely to meet, > but you are right: we should differentiate these situations. > > In one of the previous version of a patch we perform this check like so: > > #define Utf8Read(s, e) (((s) < (e)) ?\ > ucnv_getNextUChar(pUtf8conv, &s, e, &status) : 0) > > Don't sure why it was changed. Maybe it is try to correctly handle '\0' > symbol (it is valid unicode character)? The define you have pasted can return 0xffff. The reasons to change it back are described in the previous patchset. In short: 1. It is equivalent to    a. check s < e in a while loop    b. read next character inside of where loop body. 2. In some usages of the code this check (s So I see two ways to proceed: > > 1. Lean on icu's check and ignore possibility of the buffer overflow. > 2. Use our own check and possibly meet '\0' problems. > 3. Check for U_ILLEGAL_ARGUMENT_ERROR to treat as end of a string, raise > the error for other 0xffff. > > Alex, what do you suggests here? As I understand, by now the 0xffff is used ONLY to handle the case of unexpectedly ended symbol. E.g. some symbol consists of 2 characters, but the length of the input buffer is 1. In my opinion this is the same as an invalid symbol. I guess that internal buffer overflow cannot occur in the `ucnv_getNextChar` function. I suppose that it is Nikitas duty to investigate this problem and explain it to us all. I just have noticed a strange usage.