From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 977906D3F5; Mon, 25 Oct 2021 11:02:15 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 977906D3F5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1635148935; bh=vP8ZishHBn+HISo+hwWr0UPZA8y6Qqb51eLXbZcvXSQ=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=H6LDtaff0mXEPwUK8EXuxuwj5PiIqqfWG7QRxYkgRNpbnOsWHlRmW197LA/XpbKQf POMtuvtLTaJmeasZ+aR8RnH9eH51BFA0qwTlEb1S3aieI6jxB8L2LHJG2hfUYat/T8 0P3AOu5U+ANWrxJeB4K7SltcE0aGqM91XLlXdU30= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E12FA6D3F5 for ; Mon, 25 Oct 2021 11:02:13 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org E12FA6D3F5 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1meuvt-000497-5m; Mon, 25 Oct 2021 11:02:13 +0300 Date: Mon, 25 Oct 2021 11:02:12 +0300 To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Message-ID: <20211025080212.GA36295@tarantool.org> References: <7a6d7ca687a6e4d06b087af5f2e442042b38cf7b.1633713432.git.imeevma@gmail.com> <52e168e4-1559-fd6c-c5a6-d98e3c2d678a@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <52e168e4-1559-fd6c-c5a6-d98e3c2d678a@tarantool.org> X-4EC0790: 10 X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9C7814344C8C501C81DF2D982FCC3642ABC592EA95DC3FE9F182A05F5380850409F0FB7DD77E7AF51426D3E54AE450CCF46D66D088D8F40BA59AAE888D92754A4 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE70A10A23A3B64B805EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006375E280A1EC162AD7D8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D882562EABE52FDC0A342F24E8BF44F395117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8B2EE5AD8F952D28FBA471835C12D1D977C4224003CC8364762BB6847A3DEAEFB0F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7CD1D040B6C1ECEA3F43847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: 0D63561A33F958A5A00F0BDF447F116EBF108108E11766C899966D55C7B90393D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75FA7FF33AA1A4D21C410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34A150564C13026C1864E44CCC05B66CA81E98A0EE4166A95E079AEDDF88F9FDAD882D5D83956DB3D31D7E09C32AA3244CF53484E5F6B4038E4343157EEAC2F0583E8609A02908F271729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojPL6H901iH3Hzdr1lYOch0g== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5D862237E97AB1EECA3272D4049123553A83D72C36FC87018B9F80AB2734326CD2FB559BB5D741EB96352A0ABBE4FDA4210A04DAD6CC59E33667EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v1 01/21] sql: refactor CHAR() function X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Mergen Imeev via Tarantool-patches Reply-To: Mergen Imeev Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thank you for the review! My answers, diff and new patch below. Also, I replaced part of old code with ICU macro and changed commit-message. On Fri, Oct 15, 2021 at 12:42:22AM +0200, Vladislav Shpilevoy wrote: > Hi! Thanks for the patch! > > Before this commit on the branch I see a comment called 'Fix'. > Please, cleanup the branch from unfinished work. > Understood, fixed. > On 08.10.2021 19:31, imeevma@tarantool.org wrote: > > Part of #4145 > > --- > > src/box/sql/func.c | 85 ++++++++++++++++++++++------------------------ > > 1 file changed, 40 insertions(+), 45 deletions(-) > > > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > > index a3c7d8d20..dd5e7d785 100644 > > --- a/src/box/sql/func.c > > +++ b/src/box/sql/func.c > > @@ -738,6 +738,45 @@ func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) > > ctx->is_aborted = true; > > } > > > > +/** Implementation of the CHAR() function. */ > > Please, keep the comments explaining what the non-trivial functions do. > Added comments for some functions. You will these diffs in this and next few letters. > > +static void > > +func_char(struct sql_context *ctx, int argc, struct Mem *argv) > > +{ Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 6b5099826..dee28b852 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -717,43 +717,55 @@ func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } -/** Implementation of the CHAR() function. */ +/** + * Implementation of the CHAR() function. + * + * This function takes zero or more arguments, each of which is an integer. It + * constructs a string where each character of the string is the unicode + * character for the corresponding integer argument. + * + * If an argument is negative or greater than 0x10ffff, the symbol "�" is used. + * Symbol '\0' used instead of NULL argument. + */ static void func_char(struct sql_context *ctx, int argc, struct Mem *argv) { if (argc == 0) return mem_set_str_static(ctx->pOut, "", 0); - char *str = sqlDbMallocRawNN(sql_get(), argc * 4); - if (str == NULL) { + struct region *region = &fiber()->gc; + size_t svp = region_used(region); + UChar32 *buf = region_alloc(region, argc * sizeof(*buf)); + if (buf == NULL) { ctx->is_aborted = true; return; } - uint8_t *ptr = (uint8_t *)str; + int len = 0; for (int i = 0; i < argc; ++i) { - uint32_t c; if (mem_is_null(&argv[i])) - c = 0; + buf[i] = 0; else if (!mem_is_uint(&argv[i]) || argv[i].u.u > 0x10ffff) - c = 0xfffd; + buf[i] = 0xfffd; else - c = argv[i].u.u; - if (c < 0x80) { - *ptr++ = c & 0xFF; - } else if (c < 0x0800) { - *ptr++ = 0xC0 + ((c >> 6) & 0x1F); - *ptr++ = 0x80 + (c & 0x3F); - } else if (c < 0x10000) { - *ptr++ = 0xE0 + ((c >> 12) & 0x0F); - *ptr++ = 0x80 + ((c >> 6) & 0x3F); - *ptr++ = 0x80 + (c & 0x3F); - } else { - *ptr++ = 0xF0 + ((c >> 18) & 0x07); - *ptr++ = 0x80 + ((c >> 12) & 0x3F); - *ptr++ = 0x80 + ((c >> 6) & 0x3F); - *ptr++ = 0x80 + (c & 0x3F); - } + buf[i] = argv[i].u.u; + len += U8_LENGTH(buf[i]); } - mem_set_str_allocated(ctx->pOut, str, (char *)ptr - str); + + char *str = sqlDbMallocRawNN(sql_get(), len); + if (str == NULL) { + ctx->is_aborted = true; + return; + } + int pos = 0; + for (int i = 0; i < argc; ++i) { + bool is_error = false; + U8_APPEND(str, pos, len, buf[i], is_error); + assert(!is_error); + (void)is_error; + } + region_truncate(region, svp); + assert(pos == len); + (void)pos; + mem_set_str_allocated(ctx->pOut, str, len); } static const unsigned char * New patch: commit 4fa0034165697b694b3c655d92a3661ebf80a027 Author: Mergen Imeev Date: Tue Oct 5 13:55:21 2021 +0300 sql: rework CHAR() function The CHAR() function now uses the ICU macro to get characters. Part of #4145 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index afe34f7f0..dee28b852 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -717,6 +717,57 @@ func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } +/** + * Implementation of the CHAR() function. + * + * This function takes zero or more arguments, each of which is an integer. It + * constructs a string where each character of the string is the unicode + * character for the corresponding integer argument. + * + * If an argument is negative or greater than 0x10ffff, the symbol "�" is used. + * Symbol '\0' used instead of NULL argument. + */ +static void +func_char(struct sql_context *ctx, int argc, struct Mem *argv) +{ + if (argc == 0) + return mem_set_str_static(ctx->pOut, "", 0); + struct region *region = &fiber()->gc; + size_t svp = region_used(region); + UChar32 *buf = region_alloc(region, argc * sizeof(*buf)); + if (buf == NULL) { + ctx->is_aborted = true; + return; + } + int len = 0; + for (int i = 0; i < argc; ++i) { + if (mem_is_null(&argv[i])) + buf[i] = 0; + else if (!mem_is_uint(&argv[i]) || argv[i].u.u > 0x10ffff) + buf[i] = 0xfffd; + else + buf[i] = argv[i].u.u; + len += U8_LENGTH(buf[i]); + } + + char *str = sqlDbMallocRawNN(sql_get(), len); + if (str == NULL) { + ctx->is_aborted = true; + return; + } + int pos = 0; + for (int i = 0; i < argc; ++i) { + bool is_error = false; + U8_APPEND(str, pos, len, buf[i], is_error); + assert(!is_error); + (void)is_error; + } + region_truncate(region, svp); + assert(pos == len); + (void)pos; + mem_set_str_allocated(ctx->pOut, str, len); +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -1450,50 +1501,6 @@ unicodeFunc(struct sql_context *context, int argc, struct Mem *argv) sql_result_uint(context, sqlUtf8Read(&z)); } -/* - * The char() function takes zero or more arguments, each of which is - * an integer. It constructs a string where each character of the string - * is the unicode character for the corresponding integer argument. - */ -static void -charFunc(struct sql_context *context, int argc, struct Mem *argv) -{ - unsigned char *z, *zOut; - int i; - zOut = z = sql_malloc64(argc * 4 + 1); - if (z == NULL) { - context->is_aborted = true; - return; - } - for (i = 0; i < argc; i++) { - uint64_t x; - unsigned c; - if (sql_value_type(&argv[i]) == MP_INT) - x = 0xfffd; - else - x = mem_get_uint_unsafe(&argv[i]); - if (x > 0x10ffff) - x = 0xfffd; - c = (unsigned)(x & 0x1fffff); - if (c < 0x00080) { - *zOut++ = (u8) (c & 0xFF); - } else if (c < 0x00800) { - *zOut++ = 0xC0 + (u8) ((c >> 6) & 0x1F); - *zOut++ = 0x80 + (u8) (c & 0x3F); - } else if (c < 0x10000) { - *zOut++ = 0xE0 + (u8) ((c >> 12) & 0x0F); - *zOut++ = 0x80 + (u8) ((c >> 6) & 0x3F); - *zOut++ = 0x80 + (u8) (c & 0x3F); - } else { - *zOut++ = 0xF0 + (u8) ((c >> 18) & 0x07); - *zOut++ = 0x80 + (u8) ((c >> 12) & 0x3F); - *zOut++ = 0x80 + (u8) ((c >> 6) & 0x3F); - *zOut++ = 0x80 + (u8) (c & 0x3F); - } - } - sql_result_text64(context, (char *)z, zOut - z, sql_free); -} - /* * The hex() function. Interpret the argument as a blob. Return * a hexadecimal rendering as text. @@ -1846,7 +1853,7 @@ static struct sql_func_definition definitions[] = { NULL}, {"AVG", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_avg, fin_avg}, {"AVG", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, step_avg, fin_avg}, - {"CHAR", -1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_STRING, charFunc, NULL}, + {"CHAR", -1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_STRING, func_char, NULL}, {"CHAR_LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, func_char_length, NULL}, {"COALESCE", -1, {FIELD_TYPE_ANY}, FIELD_TYPE_SCALAR, sql_builtin_stub,