<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br class="">
    <blockquote type="cite"
      cite="mid:87897608-173E-45EB-80A1-8B249706D8A1@tarantool.org">
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">On 17 Aug 2018, at 14:42, Alex Khatskevich <<a
              href="mailto:avkhatskevich@tarantool.org" class=""
              moz-do-not-send="true">avkhatskevich@tarantool.org</a>>
            wrote:</div>
          <br class="Apple-interchange-newline">
          <div class=""><br style="caret-color: rgb(0, 0, 0);
              font-family: Helvetica; font-size: 12px; font-style:
              normal; font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none;" class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">On 17.08.2018 14:17, Alexander
              Turenko wrote:</span><br style="caret-color: rgb(0, 0, 0);
              font-family: Helvetica; font-size: 12px; font-style:
              normal; font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none;" class="">
            <blockquote type="cite" style="font-family: Helvetica;
              font-size: 12px; font-style: normal; font-variant-caps:
              normal; font-weight: normal; letter-spacing: normal;
              orphans: auto; text-align: start; text-indent: 0px;
              text-transform: none; white-space: normal; widows: auto;
              word-spacing: 0px; -webkit-text-size-adjust: auto;
              -webkit-text-stroke-width: 0px; text-decoration: none;"
              class="">0xffff is the result of 'end of a string' check
              as well as internal buffer<br class="">
              overflow error. I have the relevant code pasted in the
              first review of<br class="">
              the patch (July, 18).<br class="">
              <br class="">
              // source/common/ucnv.c::ucnv_getNextUChar<br class="">
              1860     s=*source;<br class="">
              1861     if(sourceLimit<s) {<br class="">
              1862         *err=U_ILLEGAL_ARGUMENT_ERROR;<br class="">
              1863         return 0xffff;<br class="">
              1864     }<br class="">
              <br class="">
              We should not handle the buffer overflow case as an
              invalid symbol. Of<br class="">
              course we should not handle it as the 'end of the string'
              situation.<br class="">
              Ideally we should perform pointer myself and raise an
              error in case of<br class="">
              0xffff. I had thought that a buffer overflow error is
              unlikely to meet,<br class="">
              but you are right: we should differentiate these
              situations.<br class="">
              <br class="">
              In one of the previous version of a patch we perform this
              check like so:<br class="">
              <br class="">
              #define Utf8Read(s, e) (((s) < (e)) ?\<br class="">
              <span class="Apple-tab-span" style="white-space: pre;"> </span>ucnv_getNextUChar(pUtf8conv,
              &s, e, &status) : 0)<br class="">
              <br class="">
              Don't sure why it was changed. Maybe it is try to
              correctly handle '\0'<br class="">
              symbol (it is valid unicode character)?<br class="">
            </blockquote>
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">The define you have pasted can
              return 0xffff.</span><br style="caret-color: rgb(0, 0, 0);
              font-family: Helvetica; font-size: 12px; font-style:
              normal; font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none;" class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">The reasons to change it back are
              described in the previous patchset.</span><br
              style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
              font-size: 12px; font-style: normal; font-variant-caps:
              normal; font-weight: normal; letter-spacing: normal;
              text-align: start; text-indent: 0px; text-transform: none;
              white-space: normal; word-spacing: 0px;
              -webkit-text-stroke-width: 0px; text-decoration: none;"
              class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">In short:</span><br
              style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
              font-size: 12px; font-style: normal; font-variant-caps:
              normal; font-weight: normal; letter-spacing: normal;
              text-align: start; text-indent: 0px; text-transform: none;
              white-space: normal; word-spacing: 0px;
              -webkit-text-stroke-width: 0px; text-decoration: none;"
              class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">1. It is equivalent to</span><br
              style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
              font-size: 12px; font-style: normal; font-variant-caps:
              normal; font-weight: normal; letter-spacing: normal;
              text-align: start; text-indent: 0px; text-transform: none;
              white-space: normal; word-spacing: 0px;
              -webkit-text-stroke-width: 0px; text-decoration: none;"
              class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">   a. check s < e in a while loop</span><br
              style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
              font-size: 12px; font-style: normal; font-variant-caps:
              normal; font-weight: normal; letter-spacing: normal;
              text-align: start; text-indent: 0px; text-transform: none;
              white-space: normal; word-spacing: 0px;
              -webkit-text-stroke-width: 0px; text-decoration: none;"
              class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">   b. read next character inside of
              where loop body.</span><br style="caret-color: rgb(0, 0,
              0); font-family: Helvetica; font-size: 12px; font-style:
              normal; font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none;" class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">2. In some usages of the code this
              check (s<e) was redundant (it was performed a couple
              lines above)</span><br style="caret-color: rgb(0, 0, 0);
              font-family: Helvetica; font-size: 12px; font-style:
              normal; font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none;" class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">3. There is no reason to rewrite the
              old version of this function. (So, we decided to use old
              version of the function)</span><br style="caret-color:
              rgb(0, 0, 0); font-family: Helvetica; font-size: 12px;
              font-style: normal; font-variant-caps: normal;
              font-weight: normal; letter-spacing: normal; text-align:
              start; text-indent: 0px; text-transform: none;
              white-space: normal; word-spacing: 0px;
              -webkit-text-stroke-width: 0px; text-decoration: none;"
              class="">
            <blockquote type="cite" style="font-family: Helvetica;
              font-size: 12px; font-style: normal; font-variant-caps:
              normal; font-weight: normal; letter-spacing: normal;
              orphans: auto; text-align: start; text-indent: 0px;
              text-transform: none; white-space: normal; widows: auto;
              word-spacing: 0px; -webkit-text-size-adjust: auto;
              -webkit-text-stroke-width: 0px; text-decoration: none;"
              class="">So I see two ways to proceed:<br class="">
              <br class="">
              1. Lean on icu's check and ignore possibility of the
              buffer overflow.<br class="">
              2. Use our own check and possibly meet '\0' problems.<br
                class="">
              3. Check for U_ILLEGAL_ARGUMENT_ERROR to treat as end of a
              string, raise<br class="">
                 the error for other 0xffff.<br class="">
              <br class="">
              Alex, what do you suggests here?<br class="">
            </blockquote>
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">As I understand, by now the 0xffff
              is used ONLY to handle the case of unexpectedly ended
              symbol.</span><br style="caret-color: rgb(0, 0, 0);
              font-family: Helvetica; font-size: 12px; font-style:
              normal; font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none;" class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">E.g. some symbol consists of 2
              characters, but the length of the input buffer is 1.</span><br
              style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
              font-size: 12px; font-style: normal; font-variant-caps:
              normal; font-weight: normal; letter-spacing: normal;
              text-align: start; text-indent: 0px; text-transform: none;
              white-space: normal; word-spacing: 0px;
              -webkit-text-stroke-width: 0px; text-decoration: none;"
              class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">In my opinion this is the same as an
              invalid symbol.</span><br style="caret-color: rgb(0, 0,
              0); font-family: Helvetica; font-size: 12px; font-style:
              normal; font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none;" class="">
            <br style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none;" class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">I guess that internal buffer
              overflow cannot occur in the `ucnv_getNextChar` function.</span><br
              style="caret-color: rgb(0, 0, 0); font-family: Helvetica;
              font-size: 12px; font-style: normal; font-variant-caps:
              normal; font-weight: normal; letter-spacing: normal;
              text-align: start; text-indent: 0px; text-transform: none;
              white-space: normal; word-spacing: 0px;
              -webkit-text-stroke-width: 0px; text-decoration: none;"
              class="">
            <br style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none;" class="">
            <span style="caret-color: rgb(0, 0, 0); font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal;
              word-spacing: 0px; -webkit-text-stroke-width: 0px;
              text-decoration: none; float: none; display: inline
              !important;" class="">I suppose that it is Nikitas duty to
              investigate this problem and explain it to us all. I just
              have noticed a strange usage.</span></div>
        </blockquote>
      </div>
      <div class=""><br class="">
      </div>
      <div class="">Hello, please consider my comments.</div>
      <div class=""><br class="">
      </div>
      <div class="">There are some cases when 0xffff can occur, but:</div>
      <div class=""><span class="Apple-tab-span" style="white-space:pre">     </span>1) <span
          style="font-family: HelveticaNeue;" class="">Cannot trigger in
          our context.</span></div>
      <div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre">        </span>2)
          C</span><span style="font-family: HelveticaNeue;" class="">annot
          trigger in our context.</span></div>
      <div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre">        </span>3)
          O</span><span style="font-family: HelveticaNeue;" class="">nly
          triggers if end < start. (Cannot happen in
          sql_utf8_pattern_compare, i guess)</span></div>
      <div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre">        </span>4)
          O</span><span style="font-family: HelveticaNeue;" class="">nly
          triggers if string length > (size_t) 0x7ffffffff (can it
          actually happen? I don’t think so).</span></div>
      <div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre">        </span>5)
          O</span><span style="font-family: HelveticaNeue;" class="">ccurs
          when trying to access to not unindexed data.</span></div>
      <div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre">        </span>6)
          Cannot occur in our context.</span></div>
      <div class=""><span style="font-family: HelveticaNeue;" class=""><span class="Apple-tab-span" style="white-space:pre">        </span>7) </span><span
          style="font-family: HelveticaNeue;" class="">Cannot occur in
          our context.</span></div>
    </blockquote>
    I do not understand what are those numbers related to. Please,
    describe it.<br>
    <blockquote type="cite"
      cite="mid:87897608-173E-45EB-80A1-8B249706D8A1@tarantool.org">
      <div class=""><span style="font-family: HelveticaNeue;" class=""><br
            class="">
        </span></div>
      <div class=""><span style="font-family: HelveticaNeue;" class="">0xfffd
          only means that symbol cannot be treated as a unicode symbol.</span></div>
      <div class=""><span style="font-family: HelveticaNeue;" class=""><br
            class="">
        </span></div>
      <div class="">
        <div class="">Shall I change it somehow then?</div>
      </div>
      <div class=""><br class="">
      </div>
      <div class=""><br class="">
        <blockquote type="cite" class="">
          <div class="">On 17 Aug 2018, at 12:23, Alex Khatskevich <<a
              href="mailto:avkhatskevich@tarantool.org" class=""
              moz-do-not-send="true">avkhatskevich@tarantool.org</a>>
            wrote:</div>
          <br class="Apple-interchange-newline">
          <div class=""><span class="" style="float: none; display:
              inline !important;">I have a look at icu code and It seems
              like 0xffff is an error, and it is more similar to</span><br
              class="">
            <span class="" style="float: none; display: inline
              !important;">invalid symbol that to "end of string". Check
              it, and fix the code, so that it is treated as</span><br
              class="">
            <span class="" style="float: none; display: inline
              !important;">an error.</span><br class="">
            <span class="" style="float: none; display: inline
              !important;">For example it is not handled in the main
              pattern loop:</span><br class="">
            <br class="">
            <span class="" style="float: none; display: inline
              !important;">+</span><span class="Apple-tab-span" style="white-space: pre;">      </span><span
              class="" style="float: none; display: inline !important;">while
              (pattern < pattern_end) {</span><br class="">
            <span class="Apple-tab-span" style="white-space: pre;">   </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span
              class="" style="float: none; display: inline !important;">c
              = Utf8Read(pattern, pattern_end);</span><br class="">
            <span class="" style="float: none; display: inline
              !important;">+</span><span class="Apple-tab-span" style="white-space: pre;">      </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span
              class="" style="float: none; display: inline !important;">if
              (c == SQL_INVALID_UTF8_SYMBOL)</span><br class="">
            <span class="" style="float: none; display: inline
              !important;">+</span><span class="Apple-tab-span" style="white-space: pre;">      </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span
              class="" style="float: none; display: inline !important;">return
              SQL_INVALID_PATTERN;</span><br class="">
            <br class="">
            <span class="" style="float: none; display: inline
              !important;">It seems like the 0xffff should be checked
              there too.</span></div>
        </blockquote>
        <br class="">
      </div>
      <div class="">No, it should not. This way it will only cause a bug
        when, for example ’select “” like “”’</div>
      <div class="">will be treated as an error.</div>
    </blockquote>
    I do not understand.<br>
    ’select “” like “”’ should not even trap inside of the while loop<br>
    (because <span class="" style="float: none; display: inline
      !important;">`pattern < pattern_end` is false).<br>
    </span>
  </body>
</html>