<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br class="">
<blockquote type="cite"
cite="mid:87897608-173E-45EB-80A1-8B249706D8A1@tarantool.org">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On 17 Aug 2018, at 14:42, Alex Khatskevich <<a
href="mailto:avkhatskevich@tarantool.org" class=""
moz-do-not-send="true">avkhatskevich@tarantool.org</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class=""><br style="caret-color: rgb(0, 0, 0);
font-family: Helvetica; font-size: 12px; font-style:
normal; font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none;" class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">On 17.08.2018 14:17, Alexander
Turenko wrote:</span><br style="caret-color: rgb(0, 0, 0);
font-family: Helvetica; font-size: 12px; font-style:
normal; font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none;" class="">
<blockquote type="cite" style="font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; text-decoration: none;"
class="">0xffff is the result of 'end of a string' check
as well as internal buffer<br class="">
overflow error. I have the relevant code pasted in the
first review of<br class="">
the patch (July, 18).<br class="">
<br class="">
// source/common/ucnv.c::ucnv_getNextUChar<br class="">
1860 s=*source;<br class="">
1861 if(sourceLimit<s) {<br class="">
1862 *err=U_ILLEGAL_ARGUMENT_ERROR;<br class="">
1863 return 0xffff;<br class="">
1864 }<br class="">
<br class="">
We should not handle the buffer overflow case as an
invalid symbol. Of<br class="">
course we should not handle it as the 'end of the string'
situation.<br class="">
Ideally we should perform pointer myself and raise an
error in case of<br class="">
0xffff. I had thought that a buffer overflow error is
unlikely to meet,<br class="">
but you are right: we should differentiate these
situations.<br class="">
<br class="">
In one of the previous version of a patch we perform this
check like so:<br class="">
<br class="">
#define Utf8Read(s, e) (((s) < (e)) ?\<br class="">
<span class="Apple-tab-span" style="white-space: pre;"> </span>ucnv_getNextUChar(pUtf8conv,
&s, e, &status) : 0)<br class="">
<br class="">
Don't sure why it was changed. Maybe it is try to
correctly handle '\0'<br class="">
symbol (it is valid unicode character)?<br class="">
</blockquote>
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">The define you have pasted can
return 0xffff.</span><br style="caret-color: rgb(0, 0, 0);
font-family: Helvetica; font-size: 12px; font-style:
normal; font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none;" class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">The reasons to change it back are
described in the previous patchset.</span><br
style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing: normal;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; text-decoration: none;"
class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">In short:</span><br
style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing: normal;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; text-decoration: none;"
class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">1. It is equivalent to</span><br
style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing: normal;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; text-decoration: none;"
class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class=""> a. check s < e in a while loop</span><br
style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing: normal;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; text-decoration: none;"
class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class=""> b. read next character inside of
where loop body.</span><br style="caret-color: rgb(0, 0,
0); font-family: Helvetica; font-size: 12px; font-style:
normal; font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none;" class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">2. In some usages of the code this
check (s<e) was redundant (it was performed a couple
lines above)</span><br style="caret-color: rgb(0, 0, 0);
font-family: Helvetica; font-size: 12px; font-style:
normal; font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none;" class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">3. There is no reason to rewrite the
old version of this function. (So, we decided to use old
version of the function)</span><br style="caret-color:
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; text-align:
start; text-indent: 0px; text-transform: none;
white-space: normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; text-decoration: none;"
class="">
<blockquote type="cite" style="font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; text-decoration: none;"
class="">So I see two ways to proceed:<br class="">
<br class="">
1. Lean on icu's check and ignore possibility of the
buffer overflow.<br class="">
2. Use our own check and possibly meet '\0' problems.<br
class="">
3. Check for U_ILLEGAL_ARGUMENT_ERROR to treat as end of a
string, raise<br class="">
the error for other 0xffff.<br class="">
<br class="">
Alex, what do you suggests here?<br class="">
</blockquote>
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">As I understand, by now the 0xffff
is used ONLY to handle the case of unexpectedly ended
symbol.</span><br style="caret-color: rgb(0, 0, 0);
font-family: Helvetica; font-size: 12px; font-style:
normal; font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none;" class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">E.g. some symbol consists of 2
characters, but the length of the input buffer is 1.</span><br
style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing: normal;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; text-decoration: none;"
class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">In my opinion this is the same as an
invalid symbol.</span><br style="caret-color: rgb(0, 0,
0); font-family: Helvetica; font-size: 12px; font-style:
normal; font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none;" class="">
<br style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none;" class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">I guess that internal buffer
overflow cannot occur in the `ucnv_getNextChar` function.</span><br
style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing: normal;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; text-decoration: none;"
class="">
<br style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none;" class="">
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; float: none; display: inline
!important;" class="">I suppose that it is Nikitas duty to
investigate this problem and explain it to us all. I just
have noticed a strange usage.</span></div>
</blockquote>
</div>
<div class=""><br class="">
</div>
<div class="">Hello, please consider my comments.</div>
<div class=""><br class="">
</div>
<div class="">There are some cases when 0xffff can occur, but:</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>1) <span
style="font-family: HelveticaNeue;" class="">Cannot trigger in
our context.</span></div>
<div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre"> </span>2)
C</span><span style="font-family: HelveticaNeue;" class="">annot
trigger in our context.</span></div>
<div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre"> </span>3)
O</span><span style="font-family: HelveticaNeue;" class="">nly
triggers if end < start. (Cannot happen in
sql_utf8_pattern_compare, i guess)</span></div>
<div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre"> </span>4)
O</span><span style="font-family: HelveticaNeue;" class="">nly
triggers if string length > (size_t) 0x7ffffffff (can it
actually happen? I don’t think so).</span></div>
<div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre"> </span>5)
O</span><span style="font-family: HelveticaNeue;" class="">ccurs
when trying to access to not unindexed data.</span></div>
<div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre"> </span>6)
Cannot occur in our context.</span></div>
<div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre"> </span>7) </span><span
style="font-family: HelveticaNeue;" class="">Cannot occur in
our context.</span></div>
</blockquote>
I do not understand what are those numbers related to. Please,
describe it.<br>
<blockquote type="cite"
cite="mid:87897608-173E-45EB-80A1-8B249706D8A1@tarantool.org">
<div class=""><span style="font-family: HelveticaNeue;" class=""><br
class="">
</span></div>
<div class=""><span style="font-family: HelveticaNeue;" class="">0xfffd
only means that symbol cannot be treated as a unicode symbol.</span></div>
<div class=""><span style="font-family: HelveticaNeue;" class=""><br
class="">
</span></div>
<div class="">
<div class="">Shall I change it somehow then?</div>
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On 17 Aug 2018, at 12:23, Alex Khatskevich <<a
href="mailto:avkhatskevich@tarantool.org" class=""
moz-do-not-send="true">avkhatskevich@tarantool.org</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class=""><span class="" style="float: none; display:
inline !important;">I have a look at icu code and It seems
like 0xffff is an error, and it is more similar to</span><br
class="">
<span class="" style="float: none; display: inline
!important;">invalid symbol that to "end of string". Check
it, and fix the code, so that it is treated as</span><br
class="">
<span class="" style="float: none; display: inline
!important;">an error.</span><br class="">
<span class="" style="float: none; display: inline
!important;">For example it is not handled in the main
pattern loop:</span><br class="">
<br class="">
<span class="" style="float: none; display: inline
!important;">+</span><span class="Apple-tab-span" style="white-space: pre;"> </span><span
class="" style="float: none; display: inline !important;">while
(pattern < pattern_end) {</span><br class="">
<span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span
class="" style="float: none; display: inline !important;">c
= Utf8Read(pattern, pattern_end);</span><br class="">
<span class="" style="float: none; display: inline
!important;">+</span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span
class="" style="float: none; display: inline !important;">if
(c == SQL_INVALID_UTF8_SYMBOL)</span><br class="">
<span class="" style="float: none; display: inline
!important;">+</span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span
class="" style="float: none; display: inline !important;">return
SQL_INVALID_PATTERN;</span><br class="">
<br class="">
<span class="" style="float: none; display: inline
!important;">It seems like the 0xffff should be checked
there too.</span></div>
</blockquote>
<br class="">
</div>
<div class="">No, it should not. This way it will only cause a bug
when, for example ’select “” like “”’</div>
<div class="">will be treated as an error.</div>
</blockquote>
I do not understand.<br>
’select “” like “”’ should not even trap inside of the while loop<br>
(because <span class="" style="float: none; display: inline
!important;">`pattern < pattern_end` is false).<br>
</span>
</body>
</html>