From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id B5A7D2CF60 for ; Thu, 18 Oct 2018 14:02:53 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i2H9zKEQrLCw for ; Thu, 18 Oct 2018 14:02:53 -0400 (EDT) Received: from smtp51.i.mail.ru (smtp51.i.mail.ru [94.100.177.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 99DAD2CE3E for ; Thu, 18 Oct 2018 14:02:52 -0400 (EDT) From: Nikita Tatunov Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_2CB2BB06-C3E3-4597-869F-1FA2BBD67596" Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Subject: [tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue Date: Thu, 18 Oct 2018 21:02:42 +0300 In-Reply-To: <45338A27-C589-4330-B206-A4E379A4DE75@tarantool.org> References: <43febf82af3702fadfea135db978ffb6426eb00d.1534436836.git.n.tatunov@tarantool.org> <20180817111727.y6nsbblpm5nh4n3g@tkn_work_nb> <436d256a-f9d0-781f-8cad-179d7322c7bd@tarantool.org> <87897608-173E-45EB-80A1-8B249706D8A1@tarantool.org> <6a1352e9-425c-d656-1bec-bb04d9f0fee6@tarantool.org> <58B407E2-AF5D-4531-A9FF-9DC57CE0070B@tarantool.org> <860a125b-19f3-3bf1-8705-25156ff508ab@tarantool.org> <45338A27-C589-4330-B206-A4E379A4DE75@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org Cc: Alexander Turenko , korablev@tarantool.org --Apple-Mail=_2CB2BB06-C3E3-4597-869F-1FA2BBD67596 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 11 Sep 2018, at 16:31, Nikita Tatunov = wrote: >=20 >=20 >=20 >> On 11 Sep 2018, at 13:06, Alex Khatskevich = > = wrote: >>=20 >>=20 >>=20 >>=20 >> On 11.09.2018 09:06, Nikita Tatunov wrote: >>>=20 >>>=20 >>>> On 11 Sep 2018, at 01:20, Alex Khatskevich = > = wrote: >>>>=20 >>>>>=20 >>>>>=20 >>>>>> On 17 Aug 2018, at 14:42, Alex Khatskevich = > = wrote: >>>>>>=20 >>>>>>=20 >>>>>> On 17.08.2018 14:17, Alexander Turenko wrote: >>>>>>> 0xffff is the result of 'end of a string' check as well as = internal buffer >>>>>>> overflow error. I have the relevant code pasted in the first = review of >>>>>>> the patch (July, 18). >>>>>>>=20 >>>>>>> // source/common/ucnv.c::ucnv_getNextUChar >>>>>>> 1860 s=3D*source; >>>>>>> 1861 if(sourceLimit>>>>>> 1862 *err=3DU_ILLEGAL_ARGUMENT_ERROR; >>>>>>> 1863 return 0xffff; >>>>>>> 1864 } >>>>>>>=20 >>>>>>> We should not handle the buffer overflow case as an invalid = symbol. Of >>>>>>> course we should not handle it as the 'end of the string' = situation. >>>>>>> Ideally we should perform pointer myself and raise an error in = case of >>>>>>> 0xffff. I had thought that a buffer overflow error is unlikely = to meet, >>>>>>> but you are right: we should differentiate these situations. >>>>>>>=20 >>>>>>> In one of the previous version of a patch we perform this check = like so: >>>>>>>=20 >>>>>>> #define Utf8Read(s, e) (((s) < (e)) ?\ >>>>>>> ucnv_getNextUChar(pUtf8conv, &s, e, &status) : 0) >>>>>>>=20 >>>>>>> Don't sure why it was changed. Maybe it is try to correctly = handle '\0' >>>>>>> symbol (it is valid unicode character)? >>>>>> The define you have pasted can return 0xffff. >>>>>> The reasons to change it back are described in the previous = patchset. >>>>>> In short: >>>>>> 1. It is equivalent to >>>>>> a. check s < e in a while loop >>>>>> b. read next character inside of where loop body. >>>>>> 2. In some usages of the code this check (s>>>>> 3. There is no reason to rewrite the old version of this = function. (So, we decided to use old version of the function) >>>>>>> So I see two ways to proceed: >>>>>>>=20 >>>>>>> 1. Lean on icu's check and ignore possibility of the buffer = overflow. >>>>>>> 2. Use our own check and possibly meet '\0' problems. >>>>>>> 3. Check for U_ILLEGAL_ARGUMENT_ERROR to treat as end of a = string, raise >>>>>>> the error for other 0xffff. >>>>>>>=20 >>>>>>> Alex, what do you suggests here? >>>>>> As I understand, by now the 0xffff is used ONLY to handle the = case of unexpectedly ended symbol. >>>>>> E.g. some symbol consists of 2 characters, but the length of the = input buffer is 1. >>>>>> In my opinion this is the same as an invalid symbol. >>>>>>=20 >>>>>> I guess that internal buffer overflow cannot occur in the = `ucnv_getNextChar` function. >>>>>>=20 >>>>>> I suppose that it is Nikitas duty to investigate this problem and = explain it to us all. I just have noticed a strange usage. >>>>>=20 >>>>>=20 >>>>> Hello, please consider my comments. >>>>>=20 >>>>> There are some cases when 0xffff can occur, but: >>>>> 1) Cannot trigger in our context. >>>>> 2) Cannot trigger in our context. >>>>> 3) Only triggers if end < start. (Cannot happen in = sql_utf8_pattern_compare, i guess) >>>>> 4) Only triggers if string length > (size_t) 0x7ffffffff (can it = actually happen? I don=E2=80=99t think so). >>>>> 5) Occurs when trying to access to not unindexed data. >>>>> 6) Cannot occur in our context. >>>>> 7) Cannot occur in our context. >>>> I do not understand what are those numbers related to. Please, = describe it. >>>=20 >>> They are related to possible cases returning 0xffff from icu source = code (function ucnv_getNextUChar()). >> Can you just copy it here, so that anyone interested in that = conversation can >> analyze it without looking for source files? >=20 Hello Alexander! Could you please consider some following explanations? I think we can treat 0xffff as a string end in sql_utf8_pattern_compare. Though we can use `status` for differentiating different errors from `ucnv_getNextUChar(pUtf8conv, &s, e, &status)`. I would like to know = your opinion. > Ok then: >=20 > U_CAPI UChar32 U_EXPORT2 > ucnv_getNextUChar(UConverter *cnv, > const char **source, const char *sourceLimit, > UErrorCode *err) { > UConverterToUnicodeArgs args; > UChar buffer[U16_MAX_LENGTH]; > const char *s; > UChar32 c; > int32_t i, length; >=20 > /* check parameters */ > if(err=3D=3DNULL || U_FAILURE(*err)) { > return 0xffff; > } >=20 1. This one cannot trigger in sql_utf8_pattern_compare(): 1) err =3D=3D &status 2) *err =3D=3D U_ZERO_ERROR > if(cnv=3D=3DNULL || source=3D=3DNULL) { > *err=3DU_ILLEGAL_ARGUMENT_ERROR; > return 0xffff; > } >=20 2. This one cannot trigger in sql_utf8_pattern_compare(): 1) cnv !=3D NULL 2) source !=3D NULL > s=3D*source; > if(sourceLimit *err=3DU_ILLEGAL_ARGUMENT_ERROR; > return 0xffff; 3. This one cannot trigger in sql_utf8_pattern_compare(): 1) ucnv_getNextUChar is only called when !(sourceLimit } >=20 > /* > * Make sure that the buffer sizes do not exceed the number range = for > * int32_t because some functions use the size (in units or bytes) > * rather than comparing pointers, and because offsets are int32_t = values. > * > * size_t is guaranteed to be unsigned and large enough for the = job. > * > * Return with an error instead of adjusting the limits because we = would > * not be able to maintain the semantics that either the source = must be > * consumed or the target filled (unless an error occurs). > * An adjustment would be sourceLimit=3Dt+0x7fffffff; for example. > */ > if(((size_t)(sourceLimit-s)>(size_t)0x7fffffff && sourceLimit>s)) = { > *err=3DU_ILLEGAL_ARGUMENT_ERROR; > return 0xffff; 4. I=E2=80=99m not sure if string data can be this long in our context.=20= (string length > (size_t) 0x7ffffffff) > } >=20 > c=3DU_SENTINEL; >=20 > /* flush the target overflow buffer */ > if(cnv->UCharErrorBufferLength>0) { > UChar *overflow; >=20 > overflow=3Dcnv->UCharErrorBuffer; > i=3D0; > length=3Dcnv->UCharErrorBufferLength; > U16_NEXT(overflow, i, length, c); >=20 > /* move the remaining overflow contents up to the beginning */ > if((cnv->UCharErrorBufferLength=3D(int8_t)(length-i))>0) { > uprv_memmove(cnv->UCharErrorBuffer, = cnv->UCharErrorBuffer+i, > cnv->UCharErrorBufferLength*U_SIZEOF_UCHAR); > } >=20 > if(!U16_IS_LEAD(c) || i return c; > } > /* > * Continue if the overflow buffer contained only a lead = surrogate, > * in case the converter outputs single surrogates from = complete > * input sequences. > */ > } >=20 > /* > * flush=3D=3DTRUE is implied for ucnv_getNextUChar() > * > * do not simply return even if s=3D=3DsourceLimit because the = converter may > * not have seen flush=3D=3DTRUE before > */ >=20 > /* prepare the converter arguments */ > args.converter=3Dcnv; > args.flush=3DTRUE; > args.offsets=3DNULL; > args.source=3Ds; > args.sourceLimit=3DsourceLimit; > args.target=3Dbuffer; > args.targetLimit=3Dbuffer+1; > args.size=3Dsizeof(args); >=20 c =3D=3D U_SENTINEL =3D=3D -1 > if(c<0) { > /* > * call the native getNextUChar() implementation if we are > * at a character boundary (toULength=3D=3D0) > * > * unlike with _toUnicode(), getNextUChar() implementations = must set > * U_TRUNCATED_CHAR_FOUND for truncated input, > * in addition to setting toULength/toUBytes[] > */ > if(cnv->toULength=3D=3D0 && = cnv->sharedData->impl->getNextUChar!=3DNULL) { > c=3Dcnv->sharedData->impl->getNextUChar(&args, err); > *source=3Ds=3Dargs.source; > if(*err=3D=3DU_INDEX_OUTOFBOUNDS_ERROR) { > /* reset the converter without calling the callback = function */ > _reset(cnv, UCNV_RESET_TO_UNICODE, FALSE); > return 0xffff; /* no output */ 5. Occurs when trying to access unindexed data. > } else if(U_SUCCESS(*err) && c>=3D0) { > return c; 6. Returns symbol (can also be 0xfffd, as it is not treated as an actual = error). So if I=E2=80=99m not mistaken we will get results in our function = either from =E2=80=98return=E2=80=99 number 5 or number 6 and the following code = will not be executed. > /* > * else fall through to use _toUnicode() because > * UCNV_GET_NEXT_UCHAR_USE_TO_U: the native function did = not want to handle it after all > * U_FAILURE: call _toUnicode() for callback handling = (do not output c) > */ > } > } >=20 > /* convert to one UChar in buffer[0], or handle getNextUChar() = errors */ > _toUnicodeWithCallback(&args, err); >=20 > if(*err=3D=3DU_BUFFER_OVERFLOW_ERROR) { > *err=3DU_ZERO_ERROR; > } >=20 > i=3D0; > length=3D(int32_t)(args.target-buffer); > } else { > /* write the lead surrogate from the overflow buffer */ > buffer[0]=3D(UChar)c; > args.target=3Dbuffer+1; > i=3D0; > length=3D1; > } >=20 > /* buffer contents starts at i and ends before length */ >=20 > if(U_FAILURE(*err)) { > c=3D0xffff; /* no output */ > } else if(length=3D=3D0) { > /* no input or only state changes */ > *err=3DU_INDEX_OUTOFBOUNDS_ERROR; > /* no need to reset explicitly because = _toUnicodeWithCallback() did it */ > c=3D0xffff; /* no output */ > } else { > c=3Dbuffer[0]; > i=3D1; > if(!U16_IS_LEAD(c)) { > /* consume c=3Dbuffer[0], done */ > } else { > /* got a lead surrogate, see if a trail surrogate follows = */ > UChar c2; >=20 > if(cnv->UCharErrorBufferLength>0) { > /* got overflow output from the conversion */ > if(U16_IS_TRAIL(c2=3Dcnv->UCharErrorBuffer[0])) { > /* got a trail surrogate, too */ > c=3DU16_GET_SUPPLEMENTARY(c, c2); >=20 > /* move the remaining overflow contents up to the = beginning */ > if((--cnv->UCharErrorBufferLength)>0) { > uprv_memmove(cnv->UCharErrorBuffer, = cnv->UCharErrorBuffer+1, > = cnv->UCharErrorBufferLength*U_SIZEOF_UCHAR); > } > } else { > /* c is an unpaired lead surrogate, just return it = */ > } > } else if(args.source /* convert once more, to buffer[1] */ > args.targetLimit=3Dbuffer+2; > _toUnicodeWithCallback(&args, err); > if(*err=3D=3DU_BUFFER_OVERFLOW_ERROR) { > *err=3DU_ZERO_ERROR; > } >=20 > length=3D(int32_t)(args.target-buffer); > if(U_SUCCESS(*err) && length=3D=3D2 && = U16_IS_TRAIL(c2=3Dbuffer[1])) { > /* got a trail surrogate, too */ > c=3DU16_GET_SUPPLEMENTARY(c, c2); > i=3D2; > } > } > } > } >=20 > /* > * move leftover output from buffer[i..length[ > * into the beginning of the overflow buffer > */ > if(i /* move further overflow back */ > int32_t delta=3Dlength-i; > if((length=3Dcnv->UCharErrorBufferLength)>0) { > uprv_memmove(cnv->UCharErrorBuffer+delta, = cnv->UCharErrorBuffer, > length*U_SIZEOF_UCHAR); > } > cnv->UCharErrorBufferLength=3D(int8_t)(length+delta); >=20 > cnv->UCharErrorBuffer[0]=3Dbuffer[i++]; > if(delta>1) { > cnv->UCharErrorBuffer[1]=3Dbuffer[i]; > } > } >=20 > *source=3Dargs.source; > return c; > } >=20 > -- > WBR, Nikita Tatunov. > n.tatunov@tarantool.org -- WBR, Nikita Tatunov. n.tatunov@tarantool.org --Apple-Mail=_2CB2BB06-C3E3-4597-869F-1FA2BBD67596 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8


On 11 = Sep 2018, at 16:31, Nikita Tatunov <n.tatunov@tarantool.org> wrote:



On 11 Sep 2018, at 13:06, Alex = Khatskevich <avkhatskevich@tarantool.org> wrote:



On 11.09.2018 09:06, Nikita = Tatunov wrote:


On 11 Sep 2018, at 01:20, Alex = Khatskevich <avkhatskevich@tarantool.org> = wrote:



On 17 = Aug 2018, at 14:42, Alex Khatskevich <avkhatskevich@tarantool.org> = wrote:


On 17.08.2018 14:17, = Alexander Turenko wrote:
0xffff is the result of 'end of a string' check = as well as internal buffer
overflow error. I have the = relevant code pasted in the first review of
the patch = (July, 18).

// = source/common/ucnv.c::ucnv_getNextUChar
1860 =     s=3D*source;
1861 =     if(sourceLimit<s) {
1862 =         *err=3DU_ILLEGAL_ARGUMENT_= ERROR;
1863 =         return 0xffff;
1864     }

We = should not handle the buffer overflow case as an invalid symbol. Of
course we should not handle it as the 'end of the string' = situation.
Ideally we should perform pointer myself and = raise an error in case of
0xffff. I had thought that a = buffer overflow error is unlikely to meet,
but you are = right: we should differentiate these situations.

In one of the previous version of a patch we perform this = check like so:

#define Utf8Read(s, e) (((s) = < (e)) ?\
ucnv_getNextUChar(pUtf8conv, = &s, e, &status) : 0)

Don't sure why = it was changed. Maybe it is try to correctly handle '\0'
symbol (it is valid unicode character)?
The = define you have pasted can return 0xffff.
The reasons to change it back are described = in the previous patchset.
In = short:
1. It = is equivalent to
   a. check s < e in a = while loop
   b. read next character = inside of where loop body.
2. In = some usages of the code this check (s<e) was redundant (it was = performed a couple lines above)
3. = There is no reason to rewrite the old version of this function. (So, we = decided to use old version of the function)
So I see two = ways to proceed:

1. Lean on icu's check and = ignore possibility of the buffer overflow.
2. Use our own = check and possibly meet '\0' problems.
3. Check for = U_ILLEGAL_ARGUMENT_ERROR to treat as end of a string, raise
   the error for other 0xffff.
Alex, what do you suggests here?
As I = understand, by now the 0xffff is used ONLY to handle the case of = unexpectedly ended symbol.
E.g. = some symbol consists of 2 characters, but the length of the input buffer = is 1.
In my = opinion this is the same as an invalid symbol.

I guess that internal buffer overflow = cannot occur in the `ucnv_getNextChar` function.

I suppose that it is Nikitas duty to = investigate this problem and explain it to us all. I just have noticed a = strange usage.

Hello, please consider my = comments.

There = are some cases when 0xffff can occur, but:
= 1) Cannot trigger in our context.
2) = Cannot = trigger in our context.
3) Only triggers if end < start. = (Cannot happen in sql_utf8_pattern_compare, i guess)
4) = Only = triggers if string length > (size_t) 0x7ffffffff (can it actually = happen? I don=E2=80=99t think so).
5) = Occurs = when trying to access to not unindexed data.
6) Cannot = occur in our context.
7) Cannot occur in our = context.
I do not understand what are those = numbers related to. Please, describe it.

They are related to possible cases = returning 0xffff from icu source code (function = ucnv_getNextUChar()).
Can you just copy it here, = so that anyone interested in that conversation can
analyze = it without looking for source files?


Hello = Alexander! Could you please consider some following = explanations?
I think we can treat 0xffff as a string end in = sql_utf8_pattern_compare.
Though we can use `status` for = differentiating different errors = from
`ucnv_getNextUChar(pUtf8conv, &s, e, &status)`. I = would like to know your opinion.

Ok then:

U_CAPI UChar32 U_EXPORT2
ucnv_getNextUChar(UConverter *cnv,
  =                 const char = **source, const char *sourceLimit,
    =               UErrorCode *err) = {
    UConverterToUnicodeArgs = args;
    UChar = buffer[U16_MAX_LENGTH];
    const char = *s;
    UChar32 c;
    int32_t i, length;

    /* check parameters = */
    if(err=3D=3DNULL || = U_FAILURE(*err)) {
        = return 0xffff;
    }


1. This one cannot trigger in = sql_utf8_pattern_compare():
1) err =3D=3D = &status
2) *err =3D=3D U_ZERO_ERROR

    if(cnv=3D=3DNULL || source=3D=3DNULL) = {
        = *err=3DU_ILLEGAL_ARGUMENT_ERROR;
    =     return 0xffff;
    = }


2. This one cannot trigger in = sql_utf8_pattern_compare():
1) cnv !=3D NULL
2) source = !=3D NULL

    = s=3D*source;
    if(sourceLimit<s) = {
        = *err=3DU_ILLEGAL_ARGUMENT_ERROR;
    =     return 0xffff;

3. This one cannot trigger in = sql_utf8_pattern_compare():
1) ucnv_getNextUChar is only = called when !(sourceLimit<s).

    = }

  =   /*
     * Make sure that the = buffer sizes do not exceed the number range for
     * int32_t because some functions use the = size (in units or bytes)
     * = rather than comparing pointers, and because offsets are int32_t = values.
     *
     * size_t is guaranteed to be unsigned and = large enough for the job.
    =  *
     * Return with an error = instead of adjusting the limits because we would
     * not be able to maintain the semantics = that either the source must be
    =  * consumed or the target filled (unless an error = occurs).
     * An adjustment would = be sourceLimit=3Dt+0x7fffffff; for example.
  =    */
    = if(((size_t)(sourceLimit-s)>(size_t)0x7fffffff && = sourceLimit>s)) {
        = *err=3DU_ILLEGAL_ARGUMENT_ERROR;
    =     return 0xffff;

4. I=E2=80=99m not sure if string data can be this = long in our context. 
    (string length > = (size_t) 0x7ffffffff)

    = }

  =   c=3DU_SENTINEL;

    /* flush the target overflow buffer = */
    = if(cnv->UCharErrorBufferLength>0) {
  =       UChar *overflow;

        = overflow=3Dcnv->UCharErrorBuffer;
    =     i=3D0;
        = length=3Dcnv->UCharErrorBufferLength;
  =       U16_NEXT(overflow, i, length, c);

      =   /* move the remaining overflow contents up to the beginning = */
        = if((cnv->UCharErrorBufferLength=3D(int8_t)(length-i))>0) = {
            = uprv_memmove(cnv->UCharErrorBuffer, = cnv->UCharErrorBuffer+i,
      =                   =  cnv->UCharErrorBufferLength*U_SIZEOF_UCHAR);
        }

        = if(!U16_IS_LEAD(c) || i<length) {
    =         return c;
    =     }
        = /*
         * Continue if = the overflow buffer contained only a lead surrogate,
         * in case the converter = outputs single surrogates from complete
  =        * input sequences.
  =        */
    = }

  =   /*
     * flush=3D=3DTRUE is = implied for ucnv_getNextUChar()
    =  *
     * do not simply return = even if s=3D=3DsourceLimit because the converter may
     * not have seen flush=3D=3DTRUE = before
     */

    /* prepare = the converter arguments */
    = args.converter=3Dcnv;
    = args.flush=3DTRUE;
    = args.offsets=3DNULL;
    = args.source=3Ds;
    = args.sourceLimit=3DsourceLimit;
    = args.target=3Dbuffer;
    = args.targetLimit=3Dbuffer+1;
    = args.size=3Dsizeof(args);


c= =3D=3D U_SENTINEL =3D=3D -1

    = if(c<0) {
        = /*
         * call the = native getNextUChar() implementation if we are
 =        * at a character boundary = (toULength=3D=3D0)
        =  *
         * unlike = with _toUnicode(), getNextUChar() implementations must set
         * U_TRUNCATED_CHAR_FOUND = for truncated input,
        =  * in addition to setting toULength/toUBytes[]
         */
        if(cnv->toULength=3D=3D0 = && cnv->sharedData->impl->getNextUChar!=3DNULL) = {
            = c=3Dcnv->sharedData->impl->getNextUChar(&args, = err);
            = *source=3Ds=3Dargs.source;
      =       if(*err=3D=3DU_INDEX_OUTOFBOUNDS_ERROR) {
                /* = reset the converter without calling the callback function */
                = _reset(cnv, UCNV_RESET_TO_UNICODE, FALSE);
  =               return 0xffff; /* no = output */

5. = Occurs when trying to access unindexed data.

            } else = if(U_SUCCESS(*err) && c>=3D0) {
  =               return = c;

6. = Returns symbol (can also be 0xfffd, as it is not treated as an actual = error).

So if I=E2=80=99m not = mistaken we will get results in our function either = from
=E2=80=98return=E2=80=99 number 5 or number 6 and the = following code will not be executed.


            /*
             * else fall = through to use _toUnicode() because
    =          *   UCNV_GET_NEXT_UCHAR_USE_TO_U: = the native function did not want to handle it after all
             *   = U_FAILURE: call _toUnicode() for callback handling (do not output = c)
            =  */
            = }
        }

      =   /* convert to one UChar in buffer[0], or handle getNextUChar() = errors */
        = _toUnicodeWithCallback(&args, err);

        = if(*err=3D=3DU_BUFFER_OVERFLOW_ERROR) {
  =           *err=3DU_ZERO_ERROR;
        }

        = i=3D0;
        = length=3D(int32_t)(args.target-buffer);
  =   } else {
        /* = write the lead surrogate from the overflow buffer */
        buffer[0]=3D(UChar)c;
        args.target=3Dbuffer+1;
        i=3D0;
  =       length=3D1;
    = }

  =   /* buffer contents starts at i and ends before length = */

  =   if(U_FAILURE(*err)) {
      =   c=3D0xffff; /* no output */
    } = else if(length=3D=3D0) {
      =   /* no input or only state changes */
  =       *err=3DU_INDEX_OUTOFBOUNDS_ERROR;
        /* no need to reset explicitly = because _toUnicodeWithCallback() did it */
  =       c=3D0xffff; /* no output */
    } else {
    =     c=3Dbuffer[0];
      =   i=3D1;
        = if(!U16_IS_LEAD(c)) {
        =     /* consume c=3Dbuffer[0], done */
        } else {
            /* got a lead = surrogate, see if a trail surrogate follows */
 =           UChar c2;

          =   if(cnv->UCharErrorBufferLength>0) {
                /* = got overflow output from the conversion */
  =               = if(U16_IS_TRAIL(c2=3Dcnv->UCharErrorBuffer[0])) {
                =     /* got a trail surrogate, too */
 =                   = c=3DU16_GET_SUPPLEMENTARY(c, c2);

          =           /* move the remaining overflow = contents up to the beginning */
    =                 = if((--cnv->UCharErrorBufferLength)>0) {
  =                     =   uprv_memmove(cnv->UCharErrorBuffer, = cnv->UCharErrorBuffer+1,
      =                     =           =  cnv->UCharErrorBufferLength*U_SIZEOF_UCHAR);
                =     }
          =       } else {
      =               /* c is an unpaired = lead surrogate, just return it */
    =             }
  =           } else if(args.source<sourceLimit) = {
              =   /* convert once more, to buffer[1] */
  =               = args.targetLimit=3Dbuffer+2;
      =           _toUnicodeWithCallback(&args, = err);
            =     if(*err=3D=3DU_BUFFER_OVERFLOW_ERROR) {
                =     *err=3DU_ZERO_ERROR;
    =             }

          =       length=3D(int32_t)(args.target-buffer);
                = if(U_SUCCESS(*err) && length=3D=3D2 && = U16_IS_TRAIL(c2=3Dbuffer[1])) {
    =                 /* got a trail = surrogate, too */
        =             c=3DU16_GET_SUPPLEMENTARY(c, = c2);
            =         i=3D2;
    =             }
  =           }
    =     }
    }

    = /*
     * move leftover output from = buffer[i..length[
     * into the = beginning of the overflow buffer
    =  */
    if(i<length) {
        /* move further overflow back = */
        int32_t = delta=3Dlength-i;
        = if((length=3Dcnv->UCharErrorBufferLength)>0) {
            = uprv_memmove(cnv->UCharErrorBuffer+delta, = cnv->UCharErrorBuffer,
      =                   =  length*U_SIZEOF_UCHAR);
      =   }
        = cnv->UCharErrorBufferLength=3D(int8_t)(length+delta);

      =   cnv->UCharErrorBuffer[0]=3Dbuffer[i++];
        if(delta>1) {
            = cnv->UCharErrorBuffer[1]=3Dbuffer[i];
  =       }
    }

    = *source=3Dargs.source;
    return = c;
}

--
WBR, Nikita Tatunov.

--
WBR, Nikita Tatunov.

= --Apple-Mail=_2CB2BB06-C3E3-4597-869F-1FA2BBD67596--