From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounce@freelists.org>
Received: from localhost (localhost [127.0.0.1])
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 98AFA264CA
	for <tarantool-patches@freelists.org>; Fri, 17 Aug 2018 07:42:51 -0400 (EDT)
Received: from turing.freelists.org ([127.0.0.1])
	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id Kti9O0QKBj40 for <tarantool-patches@freelists.org>;
	Fri, 17 Aug 2018 07:42:51 -0400 (EDT)
Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 56E42263F2
	for <tarantool-patches@freelists.org>; Fri, 17 Aug 2018 07:42:51 -0400 (EDT)
Subject: [tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern
 comparison issue
References: <cover.1534436835.git.n.tatunov@tarantool.org>
 <43febf82af3702fadfea135db978ffb6426eb00d.1534436836.git.n.tatunov@tarantool.org>
 <d11b496b-1d77-c1ef-27dd-874835bee1b9@tarantool.org>
 <20180817111727.y6nsbblpm5nh4n3g@tkn_work_nb>
From: Alex Khatskevich <avkhatskevich@tarantool.org>
Message-ID: <436d256a-f9d0-781f-8cad-179d7322c7bd@tarantool.org>
Date: Fri, 17 Aug 2018 14:42:47 +0300
MIME-Version: 1.0
In-Reply-To: <20180817111727.y6nsbblpm5nh4n3g@tkn_work_nb>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Sender: tarantool-patches-bounce@freelists.org
Errors-to: tarantool-patches-bounce@freelists.org
Reply-To: tarantool-patches@freelists.org
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <tarantool-patches-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: tarantool-patches <tarantool-patches.freelists.org>
List-subscribe: <tarantool-patches-request@freelists.org?Subject=subscribe>
List-owner: <mailto:>
List-post: <mailto:tarantool-patches@freelists.org>
List-archive: <http://www.freelists.org/archives/tarantool-patches>
To: Alexander Turenko <alexander.turenko@tarantool.org>
Cc: "N.Tatunov" <n.tatunov@tarantool.org>, tarantool-patches@freelists.org, "N.Tatunov" <hollow653@gmail.com>


On 17.08.2018 14:17, Alexander Turenko wrote:
> 0xffff is the result of 'end of a string' check as well as internal buffer
> overflow error. I have the relevant code pasted in the first review of
> the patch (July, 18).
>
> // source/common/ucnv.c::ucnv_getNextUChar
> 1860     s=*source;
> 1861     if(sourceLimit<s) {
> 1862         *err=U_ILLEGAL_ARGUMENT_ERROR;
> 1863         return 0xffff;
> 1864     }
>
> We should not handle the buffer overflow case as an invalid symbol. Of
> course we should not handle it as the 'end of the string' situation.
> Ideally we should perform pointer myself and raise an error in case of
> 0xffff. I had thought that a buffer overflow error is unlikely to meet,
> but you are right: we should differentiate these situations.
>
> In one of the previous version of a patch we perform this check like so:
>
> #define Utf8Read(s, e) (((s) < (e)) ?\
> 	ucnv_getNextUChar(pUtf8conv, &s, e, &status) : 0)
>
> Don't sure why it was changed. Maybe it is try to correctly handle '\0'
> symbol (it is valid unicode character)?
The define you have pasted can return 0xffff.
The reasons to change it back are described in the previous patchset.
In short:
1. It is equivalent to
    a. check s < e in a while loop
    b. read next character inside of where loop body.
2. In some usages of the code this check (s<e) was redundant (it was 
performed a couple lines above)
3. There is no reason to rewrite the old version of this function. (So, 
we decided to use old version of the function)
> So I see two ways to proceed:
>
> 1. Lean on icu's check and ignore possibility of the buffer overflow.
> 2. Use our own check and possibly meet '\0' problems.
> 3. Check for U_ILLEGAL_ARGUMENT_ERROR to treat as end of a string, raise
>     the error for other 0xffff.
>
> Alex, what do you suggests here?
As I understand, by now the 0xffff is used ONLY to handle the case of 
unexpectedly ended symbol.
E.g. some symbol consists of 2 characters, but the length of the input 
buffer is 1.
In my opinion this is the same as an invalid symbol.

I guess that internal buffer overflow cannot occur in the 
`ucnv_getNextChar` function.

I suppose that it is Nikitas duty to investigate this problem and 
explain it to us all. I just have noticed a strange usage.