From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 304036FC8F; Tue, 23 Mar 2021 16:07:32 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 304036FC8F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1616504852; bh=StE0qcvSyB0rOf4SbhTjZpRu/vYHk5w6SnCkYrRtiPI=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=UmZsgR1+JvQYJMjzBXWXSEtl9MDv4uj2vbFIyo6m8ojpZJWZ/3IgjkEqhiFCEG9gS Kk5ZiILeMVh0cVh2jGzOy35XsCEvYMV7cz6qR15z+R5cMRW0jqaLcyPOafgERbcYoE zk/ZVJyOn1eKCKrIzuGJFWncko7e8a7mdRjNDHmI= Received: from smtp37.i.mail.ru (smtp37.i.mail.ru [94.100.177.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 0EDA56FC8F for ; Tue, 23 Mar 2021 16:07:31 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 0EDA56FC8F Received: by smtp37.i.mail.ru with esmtpa (envelope-from ) id 1lOgkr-0006EV-JL; Tue, 23 Mar 2021 16:07:30 +0300 Date: Tue, 23 Mar 2021 16:07:28 +0300 To: Vladislav Shpilevoy Message-ID: <20210323130728.GB142065@tarantool.org> References: <9c119b89121aa3b0f121e104e078ea45db233db5.1613839653.git.imeevma@gmail.com> <31e665c3-08ff-c3ff-cac7-048d93894699@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <31e665c3-08ff-c3ff-cac7-048d93894699@tarantool.org> X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD95D6E7CC48CB1F5F1C82687294EF6886B62CC997B0B8C95C4182A05F5380850405618DC6650BAA72159E2539A2010C6CAB742AFF5CD659F6BA543F94BC95BFF8F X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE74B44AB1D52BB6B9BC2099A533E45F2D0395957E7521B51C2CFCAF695D4D8E9FCEA1F7E6F0F101C6778DA827A17800CE7BF27F4FA2823CCF0EA1F7E6F0F101C67CDEEF6D7F21E0D1D174C73DBBBFC7664E4396F1E022E8FF2883DCF49B110B44655FA42C48E7D9054389733CBF5DBD5E913377AFFFEAFD269A417C69337E82CC2CC7F00164DA146DAFE8445B8C89999729449624AB7ADAF37F6B57BC7E64490611E7FA7ABCAF51C92176DF2183F8FC7C0D9442B0B5983000E8941B15DA834481F9449624AB7ADAF37BA3038C0950A5D3613377AFFFEAFD269176DF2183F8FC7C0333383368A6F7D0F7B076A6E789B0E97A8DF7F3B2552694A1E7802607F20496D49FD398EE364050FFAD5A440E159F97DA68A47777D5C6D9CB3661434B16C20AC78D18283394535A9E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD1CF19DD082D7633A0C77107234E2CFBA567F23339F89546C55F5C1EE8F4F765FC953A8A48A05D51F175ECD9A6C639B01BBD4B6F7A4D31EC0BC0CAF46E325F83A522CA9DD8327EE4930A3850AC1BE2E73542F54486E6D6388DC4224003CC836476C0CAF46E325F83A50BF2EBBBDD9D6B0F347543BADC64E7283B503F486389A921A5CC5B56E945C8DA X-C1DE0DAB: 0D63561A33F958A5F9DF9AB01D88DA52EAAB76EFB81D15549D596AB2F8D30B78D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7502E6951B79FF9A3F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34980A6B448CFD1B8AD6798244E1137A240B7A841A79DCE8585A1CE20684FC1C24F434D446153E0E271D7E09C32AA3244C1F29F940D342C0D78CF568E6843CE3DB24AF4FAF06DA24FD927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojNqBGwjEnRoX0VndygqkbPQ== X-Mailru-Sender: 5C3750E245F362008BC1685FEC6306ED92BC14A40CA3A5CF59E2539A2010C6CA825871508D9F39705105BD0848736F9966FEC6BF5C9C28D97E07721503EA2E00ED97202A5A4E92BF7402F9BA4338D657ED14614B50AE0675 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v3 2/2] sql: Encapsulate MEM type changing and checking X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Mergen Imeev via Tarantool-patches Reply-To: Mergen Imeev Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Thank you for the review. My answers below. On Sun, Feb 28, 2021 at 06:35:46PM +0100, Vladislav Shpilevoy wrote: > Thanks for the patch! > > See 33 comments below. > > On 20.02.2021 17:59, imeevma@tarantool.org wrote: > > This patch encapsulates type changing and checking for MEM. This is done to make > > it easier for us to introduce new rules for implicit and explicit type casting > > and new types in SQL. > > > > Part of #5818 > > --- > > src/box/sql/func.c | 14 +- > > src/box/sql/sqlInt.h | 1 - > > src/box/sql/vdbe.c | 513 ++++++++++++++++++---------------------- > > src/box/sql/vdbeInt.h | 465 +++++++++++++++++++++++++++++++++--- > > src/box/sql/vdbeapi.c | 57 ++--- > > src/box/sql/vdbeaux.c | 360 ++++++++++++++-------------- > > src/box/sql/vdbemem.c | 146 +----------- > > src/box/sql/vdbesort.c | 9 +- > > src/box/sql/vdbetrace.c | 12 +- > > 9 files changed, 873 insertions(+), 704 deletions(-) > > > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > > index f15d27051..70b8f8eb7 100644 > > --- a/src/box/sql/func.c > > +++ b/src/box/sql/func.c > > @@ -283,13 +283,12 @@ port_lua_get_vdbemem(struct port *base, uint32_t *size) > > mem_set_u64(&val[i], field.ival); > > break; > > case MP_STR: > > - if (sqlVdbeMemSetStr(&val[i], field.sval.data, > > - field.sval.len, 1, > > - SQL_TRANSIENT) != 0) > > + if (mem_copy_str(&val[i], field.sval.data, > > + field.sval.len, false) != 0) > > 1. Having flags passed explicitly always makes the code harder to read. > Beause the reader has no idea what 'false' means. > > Also you have 4 cases when you calculate strlen() for the argument. > > Please, make it 3 separate functions: > > mem_copy_str() // Calls strlen() inside. > mem_copy_strn() // Does not include 0 byte. > mem_copy_strn0() // The name can be discussed. > Done. Also, I made separate functions for each allocation type. > > diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c > > index 3b3b1f01d..8b87165ee 100644 > > --- a/src/box/sql/vdbe.c > > +++ b/src/box/sql/vdbe.c > > @@ -116,7 +116,7 @@ int sql_max_blobsize = 0; > > static void > > updateMaxBlobsize(Mem *p) > > { > > - if ((p->flags & (MEM_Str|MEM_Blob))!=0 && p->n>sql_max_blobsize) { > > + if (mem_is_varstring(p) && p->n>sql_max_blobsize) { > > 2. Please, add whitespaces around `>`. > I think I did so in new version. > > sql_max_blobsize = p->n; > > } > > } > > @@ -185,7 +185,7 @@ vdbeTakeBranch(int iSrcLine, u8 I, u8 M) > > * already. Return non-zero if a malloc() fails. > > */ > > #define Stringify(P) \ > > - if(((P)->flags&(MEM_Str|MEM_Blob))==0 && sqlVdbeMemStringify(P)) \ > > + if(!mem_is_varstring(P) && sqlVdbeMemStringify(P)) \ > > 3. Please, add whitespace after `if`. Also for error check we should > use `sqlVdbeMemStringify(P) != 0`, no implicit bool casts. > Same. > > @@ -528,16 +518,13 @@ mem_convert_to_numeric(struct Mem *mem, enum field_type type) > > * numeric type, if has one. Set the pMem->u.r and pMem->u.i fields > > * accordingly. > > */ > > -static u16 SQL_NOINLINE computeNumericType(Mem *pMem) > > +static bool SQL_NOINLINE computeNumericType(Mem *pMem, int64_t *i, bool *is_neg) > > 4. For functions doing something we use 0/-1 as success/error sign. Since you > decided to refactor this function. Also we don't use Camel naming notation in > the new code. This code can be considered new, since it is changed on 100%. > > It also does not reflect what the function is doing I think. It says 'compute type', > but does not return a type anymore. Besides, it is not about 'numeric' now. > Each part of the name is misleading. Maybe call it mem_convert_to_int()? Or just > inline it in its single usage place? > > And finally, we write return type on a separate line in the new code. > I inlined these functions. > > { > > - assert((pMem->flags & (MEM_Int | MEM_UInt | MEM_Real)) == 0); > > - assert((pMem->flags & (MEM_Str|MEM_Blob))!=0); > > - if (sqlAtoF(pMem->z, &pMem->u.r, pMem->n)==0) > > - return 0; > > - bool is_neg; > > - if (sql_atoi64(pMem->z, (int64_t *) &pMem->u.i, &is_neg, pMem->n) == 0) > > - return is_neg ? MEM_Int : MEM_UInt; > > - return MEM_Real; > > + assert(!mem_is_integer(pMem)); > > + assert(mem_is_varstring(pMem)); > > + if (sql_atoi64(pMem->z, i, is_neg, pMem->n) == 0) > > + return true; > > + return false; > > } > > > > /* > > @@ -547,14 +534,17 @@ static u16 SQL_NOINLINE computeNumericType(Mem *pMem) > > * Unlike mem_apply_numeric_type(), this routine does not modify pMem->flags. > > * But it does set pMem->u.r and pMem->u.i appropriately. > > */ > > -static u16 numericType(Mem *pMem) > > +static bool numericType(Mem *pMem, int64_t *i, bool *is_neg) > > 5. All the same comments here. Except that you can keep the function > itself probably. It is used more than once. > Same. > > { > > - if ((pMem->flags & (MEM_Int | MEM_UInt | MEM_Real)) != 0) > > - return pMem->flags & (MEM_Int | MEM_UInt | MEM_Real); > > - if (pMem->flags & (MEM_Str|MEM_Blob)) { > > - return computeNumericType(pMem); > > + if (mem_is_integer(pMem)) { > > + *i = pMem->u.i; > > + *is_neg = mem_is_neg_int(pMem); > > + return true; > > } > > - return 0; > > + if (mem_is_varstring(pMem)) { > > + return computeNumericType(pMem, i, is_neg); > > + } > > + return false; > > }> @@ -731,35 +718,32 @@ char * > > > > /* Allocate memory for internal VDBE structure on region. */ > > static int > > vdbe_mem_alloc_blob_region(struct Mem *vdbe_mem, uint32_t size) > > { > > - vdbe_mem->n = size; > > - vdbe_mem->z = region_alloc(&fiber()->gc, size); > > - if (vdbe_mem->z == NULL) > > + char *buf = region_alloc(&fiber()->gc, size); > > + if (buf == NULL) { > > + diag_set(OutOfMemory, size, "region_alloc", "buf"); > > return -1; > > - vdbe_mem->flags = MEM_Ephem | MEM_Blob; > > + } > > + mem_set_bin(vdbe_mem, buf, size, MEM_Ephem, false); > > 6. All the same as with mem_copy_str - 'false' does not give a > clue what the function does with that parameter. > Fixed. > > assert(sqlVdbeCheckMemInvariants(vdbe_mem)); > > return 0; > > } > > @@ -879,34 +863,13 @@ vdbe_field_ref_fetch(struct vdbe_field_ref *field_ref, uint32_t fieldno, > > if (vdbe_decode_msgpack_into_mem(data, dest_mem, &dummy) != 0) > > return -1; > > > > - /* > > - * MsgPack map, array or extension (unsupported in sql). > > - * Wrap it in a blob verbatim. > > - */ > > - if (dest_mem->flags == 0) { > > - dest_mem->z = (char *) data; > > - dest_mem->n = vdbe_field_ref_fetch_data(field_ref, > > - fieldno + 1) - data; > > - dest_mem->flags = MEM_Blob | MEM_Ephem | MEM_Subtype; > > - dest_mem->subtype = SQL_SUBTYPE_MSGPACK; > > - } > > 7. Why did you delete that? Seems like it was never necessary, right? > Maybe delete it in a separate commit? I saw Timur does not like having > many commits, but he is wrong. The fact of having many commits does not > affect patch readability. Only commit atomicity does. It is fine to > have a titanic commit as long as it is atomic. Just as it is fine to > have 40 commits if they are all atomic too. > > I didn't review v1 very thourougly, but at a first glance it looked > better. > Done. > > - /* > > - * Add 0 termination (at most for strings) > > - * Not sure why do we check MEM_Ephem > > - */ > > - if ((dest_mem->flags & (MEM_Ephem | MEM_Str)) == > > - (MEM_Ephem | MEM_Str)) { > > - int len = dest_mem->n; > > - if (dest_mem->szMalloc < len + 1) { > > - if (sqlVdbeMemGrow(dest_mem, len + 1, 1) != 0) > > - return -1; > > + if (mem_is_string(dest_mem) && (dest_mem->flags & MEM_Ephem) != 0) { > > 8. The patch looks incomplete while you still have to check flags manually > in the code which is not a part of Mem API. This function (vdbe_field_ref_fetch) > does not look like a part of Mem's API. It rather belongs to Vdbe. > Fixed. > > + if (dest_mem->n > 0) { > > + mem_copy_str(dest_mem, dest_mem->z, dest_mem->n, > > + (dest_mem->flags & MEM_Term) != 0); > > 9. The same here. Perhaps for this entire block you could add a > function mem_copy_strmem(dst, src). It would assert inside that the source > mem is a string object, and would hide all this inside. > I created mem_copy(), however, in this case I changed the funtion which decodes msgpack. > > } else { > > - dest_mem->z = > > - memcpy(dest_mem->zMalloc, dest_mem->z, len); > > - dest_mem->flags &= ~MEM_Ephem; > > + mem_set_str(dest_mem, "", 0, MEM_Static, true); > > } > > - dest_mem->z[len] = 0; > > - dest_mem->flags |= MEM_Term; > > } > > UPDATE_MAX_BLOBSIZE(dest_mem); > > dest_mem->field_type = vdbe_field_ref_fetch_type(field_ref, fieldno); > > @@ -1364,19 +1325,17 @@ case OP_String: { /* out2 */ > > */ > > case OP_Null: { /* out2 */ > > int cnt; > > - u16 nullFlag; > > pOut = vdbe_prepare_null_out(p, pOp->p2); > > cnt = pOp->p3-pOp->p2; > > assert(pOp->p3<=(p->nMem+1 - p->nCursor)); > > - pOut->flags = nullFlag = pOp->p1 ? (MEM_Null|MEM_Cleared) : MEM_Null; > > - pOut->n = 0; > > + if (pOp->p1) > > 10. We use != 0 instead of implicit cast. > Fixed. > > + pOut->flags |= MEM_Cleared; > > while( cnt>0) { > > pOut++; > > memAboutToChange(p, pOut); > > 11. Why isn't this included into each mem_set/mem_copy function? > This function works only for MEMs from VDBE. We do not have list of all MEMs. > > - sqlVdbeMemSetNull(pOut); > > - pOut->flags = nullFlag; > > - pOut->field_type = field_type_MAX; > > - pOut->n = 0; > > + mem_set_null(pOut); > > + if (pOp->p1) > > + pOut->flags |= MEM_Cleared; > > 12. To help performance you could keep `nullFlag` variable, > and set it to either 0 or MEM_Cleared. Then in the loop > you can do `pOut->flags |= nullFlag;` without any branches. > > By the way, the Cleared thing looks related to NULL type. It > changes its behaviour. Maybe it is worth encapsulating it too. > It depends on whether you will manage to extract OP_Ge/... > into a function not depending on Vdbe but only on mems. > > Which would be great as the comparison opcode now is enormous, > hard to read, and impossible to unit test. > I rewrote this in new version. > > cnt--; > > } > > break;> @@ -1608,7 +1566,7 @@ case OP_Concat: { /* same as TK_CONCAT, in1, in2, out3 */ > > } > > > > /* Moreover, both operands must be of the same type. */ > > - if (str_type_p1 != str_type_p2) { > > + if (!mems_have_same_type(pIn1, pIn2)) { > > 13. Better keep it 'mem', not 'mems'. To have all the Mem API functions > have the same prefix. > Fixed. > > diag_set(ClientError, ER_INCONSISTENT_TYPES, > > mem_type_to_str(pIn2), mem_type_to_str(pIn1)); > > goto abort_due_to_error; > > @@ -1619,21 +1577,19 @@ case OP_Concat: { /* same as TK_CONCAT, in1, in2, out3 */ > > if (nByte>db->aLimit[SQL_LIMIT_LENGTH]) { > > goto too_big; > > } > > - if (sqlVdbeMemGrow(pOut, (int)nByte+2, pOut==pIn2)) { > > - goto no_mem; > > + size_t svp = region_used(&fiber()->gc); > > + char *buf = region_alloc(&fiber()->gc, nByte); > > 14. Recently we confirmed that TLS access might be actually not as fast > as a normal variable access. Fiber() reads a TLS variable. So if you > care abot top perf, better cache &fiber()->gc in a variable. Up to you > though. I can't force such rule because can't (and don't want) to measure > if it really wins anything. > Thanks. For now I decided to not think too much about perf. > > + if (buf == NULL) { > > + diag_set(OutOfMemory, nByte, "region_alloc", "buf"); > > + goto abort_due_to_error; > > } > > - if (pIn1->flags & MEM_Str) > > - MemSetTypeFlag(pOut, MEM_Str); > > + memcpy(buf, pIn2->z, pIn2->n); > > + memcpy(&buf[pIn2->n], pIn1->z, pIn1->n); > > + if (mem_is_binary(pIn1)) > > + mem_copy_bin(pOut, buf, nByte, false); > > else > > - MemSetTypeFlag(pOut, MEM_Blob); > > - if (pOut!=pIn2) { > > - memcpy(pOut->z, pIn2->z, pIn2->n); > > - } > > - memcpy(&pOut->z[pIn2->n], pIn1->z, pIn1->n); > > - pOut->z[nByte]=0; > > - pOut->z[nByte+1] = 0; > > - pOut->flags |= MEM_Term; > > - pOut->n = (int)nByte; > > + mem_copy_str(pOut, buf, nByte, false); > > + region_truncate(&fiber()->gc, svp); > > 15. AFAIU, before your patch it was allocated in place, with sqlVdbeMemGrow() > used to reserve the space. Why did you change it? Looks slower. Before your > changes there was 2 copies at most: out mem realloc + second mem copy. Now > there are 4 copies: copy original + new into new place, copy the 2 copies > into the out mem. > Rewrote this part. > > UPDATE_MAX_BLOBSIZE(pOut); > > break; > > } > > @@ -1681,27 +1637,25 @@ case OP_Subtract: /* same as TK_MINUS, in1, in2, out3 */ > > case OP_Multiply: /* same as TK_STAR, in1, in2, out3 */ > > case OP_Divide: /* same as TK_SLASH, in1, in2, out3 */ > > case OP_Remainder: { /* same as TK_REM, in1, in2, out3 */ > > - u32 flags; /* Combined MEM_* flags from both inputs */ > > - u16 type1; /* Numeric type of left operand */ > > - u16 type2; /* Numeric type of right operand */ > > + bool is_in1_int; /* Numeric type of left operand */ > > + bool is_in2_int; /* Numeric type of right operand */ > > 16. The comments are wrong now. > Rewrote this part. > > i64 iA; /* Integer value of left operand */ > > i64 iB; /* Integer value of right operand */ > > + bool is_lhs_neg; > > + bool is_rhs_neg; > > double rA; /* Real value of left operand */ > > double rB; /* Real value of right operand */ > > > > pIn1 = &aMem[pOp->p1]; > > - type1 = numericType(pIn1); > > + is_in1_int = numericType(pIn1, (int64_t *)&iA, &is_lhs_neg); > > pIn2 = &aMem[pOp->p2]; > > - type2 = numericType(pIn2); > > + is_in2_int = numericType(pIn2, (int64_t *)&iB, &is_rhs_neg); > > pOut = vdbe_prepare_null_out(p, pOp->p3); > > - flags = pIn1->flags | pIn2->flags; > > - if ((flags & MEM_Null)!=0) goto arithmetic_result_is_null; > > - if ((type1 & (MEM_Int | MEM_UInt)) != 0 && > > - (type2 & (MEM_Int | MEM_UInt)) != 0) { > > - iA = pIn1->u.i; > > - iB = pIn2->u.i; > > - bool is_lhs_neg = pIn1->flags & MEM_Int; > > - bool is_rhs_neg = pIn2->flags & MEM_Int; > > + if (mem_is_null(pIn1) || mem_is_null(pIn2)) > > + goto arithmetic_result_is_null; > > + if (is_in1_int && is_in2_int) { > > + bool is_lhs_neg = mem_is_neg_int(pIn1); > > + bool is_rhs_neg = mem_is_neg_int(pIn2); > > 17. You already know is_lhs_neg and is_rhs_neg from the > numericType() calls above. > True, but this will change test. I was asked to not change behaviour during this refactoring. I added comment in new version. > > bool is_res_neg; > > switch( pOp->opcode) { > > case OP_Add: { > > @@ -2265,10 +2212,10 @@ case OP_Ge: { /* same as TK_GE, jump, in1, in3 */ > > * or not both operands are null. > > */ > > assert(pOp->opcode==OP_Eq || pOp->opcode==OP_Ne); > > - assert((flags1 & MEM_Cleared)==0); > > + assert((pIn1->flags & MEM_Cleared)==0); > > assert((pOp->p5 & SQL_JUMPIFNULL)==0); > > - if ((flags1&flags3&MEM_Null)!=0 > > - && (flags3&MEM_Cleared)==0 > > + if (mem_is_null(pIn1) && mem_is_null(pIn3) > > + && (pIn3->flags & MEM_Cleared)==0 > > 18. This is what I was talking about with Cleared. It seems to be > an internal flag of Mem which is needed only for the comparison. > Which means you probably should hide it. With the entire comparison > opcode code along. We already have sqlMemCompare. Why does not > the opcode use it? Probably worth doing something with this since > you are working on the encapsulation. > > To get the comparison code in order and to hide Mem details in it. > Rewrote this part. Not sure that this comment was answered. > > ) { > > res = 0; /* Operands are equal */ > > } else { > > @@ -2314,18 +2261,25 @@ case OP_Ge: { /* same as TK_GE, jump, in1, in3 */ > > res = sqlMemCompare(pIn3, pIn1, NULL); > > } else { > > enum field_type type = pOp->p5 & FIELD_TYPE_MASK; > > + struct Mem tmp_mem1, tmp_mem2, *mem1, *mem2; > > + mem1 = pIn1; > > + mem2 = pIn3; > > 19. Wtf? Why do you need to copy the mems? The original code worked > fine without them. It is not just API refactoring then. If this is > really necessary, lets do it in a separate patch with clear explanation > why you need it. > Rewrote this part. > > if (sql_type_is_numeric(type)) { > > - if ((flags1 | flags3)&MEM_Str) { > > - if ((flags1 & MEM_Str) == MEM_Str) { > > - mem_apply_numeric_type(pIn1); > > - testcase( flags3!=pIn3->flags); /* Possible if pIn1==pIn3 */ > > - flags3 = pIn3->flags; > > + if (mem_is_string(mem1) || mem_is_string(mem2)) { > > + if (mem_is_string(mem1)) { > > + mem_init(&tmp_mem1); > > 20. Constructors should have 'create' suffix. > Fixed. > > + memcpy(&tmp_mem1, mem1, sizeof(*mem1)); > > + mem1 = &tmp_mem1; > > + mem_apply_numeric_type(mem1); > > } > > - if ((flags3 & MEM_Str) == MEM_Str) { > > - if (mem_apply_numeric_type(pIn3) != 0) { > > + if (mem_is_string(mem2)) { > > + mem_init(&tmp_mem2); > > + memcpy(&tmp_mem2, mem2, sizeof(*mem2)); > > + mem2 = &tmp_mem2; > > + if (mem_apply_numeric_type(mem2) != 0) { > > diag_set(ClientError, > > ER_SQL_TYPE_MISMATCH, > > - sql_value_to_diag_str(pIn3), > > + sql_value_to_diag_str(mem2), > > "numeric"); > > goto abort_due_to_error; > > } > > @@ -3860,14 +3797,14 @@ case OP_FCopy: { /* out2 */ > > pIn1 = &aMem[pOp->p1]; > > } > > > > - if ((pOp->p3 & OPFLAG_NOOP_IF_NULL) && (pIn1->flags & MEM_Null)) { > > + if ((pOp->p3 & OPFLAG_NOOP_IF_NULL) && mem_is_null(pIn1)) { > > pOut = vdbe_prepare_null_out(p, pOp->p2); > > } else { > > assert(memIsValid(pIn1)); > > - assert((pIn1->flags & (MEM_Int | MEM_UInt)) != 0); > > + assert(mem_is_integer(pIn1)); > > > > pOut = vdbe_prepare_null_out(p, pOp->p2); > > - mem_set_int(pOut, pIn1->u.i, pIn1->flags == MEM_Int); > > + mem_set_int(pOut, pIn1->u.i, mem_is_neg_int(pIn1)); > > 21. Perhaps it is worth adding mem_set_intmem(Mem* src). You won't > need to call mem_is_neg_int() then. -1 branch. > Rewrote this part. > > pOut->field_type = pIn1->field_type; > > } > > break; > > @@ -5252,7 +5187,7 @@ case OP_AggFinal: { > > Mem *pMem; > > assert(pOp->p1>0 && pOp->p1<=(p->nMem+1 - p->nCursor)); > > pMem = &aMem[pOp->p1]; > > - assert((pMem->flags & ~(MEM_Null|MEM_Agg))==0); > > + assert(mem_is_null(pMem) || (pMem->flags & MEM_Agg) != 0); > > 22. MEM_Agg says "Mem.z points to an agg function context". So it > is related to mem types. And probably there must be a helper > mem_is_aggctx(). > Fixed by adding new check functions. > > if (sql_vdbemem_finalize(pMem, pOp->p4.func) != 0) > > goto abort_due_to_error; > > UPDATE_MAX_BLOBSIZE(pMem); > > diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h > > index 7205f1af3..a80a7b511 100644 > > --- a/src/box/sql/vdbeInt.h > > +++ b/src/box/sql/vdbeInt.h > > @@ -484,41 +484,455 @@ void sqlVdbeMemMove(Mem *, Mem *); > > int sqlVdbeMemNulTerminate(Mem *); > > int sqlVdbeMemSetStr(Mem *, const char *, int, u8, void (*)(void *)); > > > > +/** > > + * Memory cell mem contains the context of an aggregate function. > > + * This routine calls the finalize method for that function. The > > + * result of the aggregate is stored back into mem. > > + * > > + * Returns -1 if the finalizer reports an error. 0 otherwise. > > + */ > > +int > > +sql_vdbemem_finalize(struct Mem *mem, struct func *func); > > + > > +/* > > 23. Comments out of function bodies should start from /**. > Fixed in new functions. However, I believe I didn't change this in functions that were just moved. > > + * If the memory cell contains a value that must be freed by > > + * invoking the external callback in Mem.xDel, then this routine > > + * will free that value. It also sets Mem.flags to MEM_Null. > > + */ > > void > > -mem_set_bool(struct Mem *mem, bool value); > > +vdbeMemClearExternAndSetNull(Mem * p); > > 24. Please, don't use whitespace after `*`. Here and in the > next declaration. > Same. > > + > > +/* > > + * Change the pMem->zMalloc allocation to be at least szNew bytes. > > 25. There is no `szNew` argument. > Just copied this function for now, didn't change comments. > > + * If pMem->zMalloc already meets or exceeds the requested size, this > > + * routine is a no-op. > > + * > > + * Any prior string or blob content in the pMem object may be discarded. > > + * The pMem->xDel destructor is called, if it exists. Though MEM_Str > > + * and MEM_Blob values may be discarded, MEM_Int, MEM_Real, and MEM_Null > > + * values are preserved. > > + * > > + * Return 0 on success or -1 if unable to complete the resizing. > > + */ > > +int > > +sqlVdbeMemClearAndResize(Mem * pMem, int n); > > + > > +/** > > + * Initialize a new mem. After initializing the mem will hold a NULL value. > > + * > > + * @param mem VDBE memory register to initialize. > > 26. I will remind just in case - using doxygen is not obligatory. Especially > for such trivial functions. > There is a lot less comments in new version, at least for now... > > diff --git a/src/box/sql/vdbeapi.c b/src/box/sql/vdbeapi.c > > index 7c59ef83f..338853495 100644 > > --- a/src/box/sql/vdbeapi.c > > +++ b/src/box/sql/vdbeapi.c > > @@ -588,39 +588,17 @@ sql_data_count(sql_stmt * pStmt) > > static const Mem * > > columnNullValue(void) > > { > > - /* Even though the Mem structure contains an element > > - * of type i64, on certain architectures (x86) with certain compiler > > - * switches (-Os), gcc may align this Mem object on a 4-byte boundary > > - * instead of an 8-byte one. This all works fine, except that when > > - * running with SQL_DEBUG defined the sql code sometimes assert()s > > - * that a Mem structure is located on an 8-byte boundary. To prevent > > - * these assert()s from failing, when building with SQL_DEBUG defined > > - * using gcc, we force nullMem to be 8-byte aligned using the magical > > - * __attribute__((aligned(8))) macro. > > - */ > > - static const Mem nullMem > > #if defined(SQL_DEBUG) && defined(__GNUC__) > > - __attribute__ ((aligned(8))) > > -#endif > > - = { > > - /* .u = */ { > > - 0}, > > - /* .flags = */ (u16) MEM_Null, > > - /* .eSubtype = */ (u8) 0, > > - /* .field_type = */ field_type_MAX, > > - /* .n = */ (int)0, > > - /* .z = */ (char *)0, > > - /* .zMalloc = */ (char *)0, > > - /* .szMalloc = */ (int)0, > > - /* .uTemp = */ (u32) 0, > > - /* .db = */ (sql *) 0, > > - /* .xDel = */ (void (*)(void *))0, > > -#ifdef SQL_DEBUG > > - /* .pScopyFrom = */ (Mem *) 0, > > - /* .pFiller = */ (void *)0, > > + static struct Mem nullMem __attribute__ ((aligned(8))); > > +#else > > + static struct Mem nullMem; > > #endif > > - }; > > - return &nullMem; > > + static struct Mem *null_mem_ptr = NULL; > > + if (null_mem_ptr == NULL) { > > + mem_init(&nullMem); > > + null_mem_ptr = &nullMem; > > + } > > + return null_mem_ptr; > > 27. The old code was a bit ugly, yeah, but it didn't have > branches. It is fine to keep the old code, because it is > internal part of Mem API anyway. At least it should be. > Fixed. Also, at some point I removed this function. > > } > > diff --git a/src/box/sql/vdbeaux.c b/src/box/sql/vdbeaux.c > > index 5c2706e49..fb55f82a6 100644 > > --- a/src/box/sql/vdbeaux.c > > +++ b/src/box/sql/vdbeaux.c > > @@ -1198,17 +1198,13 @@ sqlVdbePrintOp(FILE * pOut, int pc, Op * pOp) > > * Initialize an array of N Mem element. > > */ > > static void > > -initMemArray(Mem * p, int N, sql * db, u32 flags) > > +initMemArray(Mem *p, int N, bool is_undefined) > > { > > - while ((N--) > 0) { > > - p->db = db; > > - p->flags = flags; > > - p->szMalloc = 0; > > - p->field_type = field_type_MAX; > > -#ifdef SQL_DEBUG > > - p->pScopyFrom = 0; > > -#endif > > - p++; > > + for (int i = 0; i < N; ++i) { > > + struct Mem *mem = &p[i]; > > + mem_init(mem); > > + if (is_undefined) > > + mem_set_undefined(mem); > > } > > 28. Maybe just inline this entire function. > Inlined. > > } > > @@ -1382,13 +1376,25 @@ sqlVdbeList(Vdbe * p) > > if (apSub[j] == pOp->p4.pProgram) > > break; > > } > > - if (j == nSub && > > - sqlVdbeMemGrow(pSub, nByte, > > - nSub != 0) == 0) { > > - apSub = (SubProgram **) pSub->z; > > - apSub[nSub++] = pOp->p4.pProgram; > > - pSub->flags |= MEM_Blob; > > - pSub->n = nSub * sizeof(SubProgram *); > > + if (j == nSub) { > > + size_t svp = region_used(&fiber()->gc); > > + struct SubProgram **buf = (SubProgram **) > > + region_aligned_alloc(&fiber()->gc, > > + nByte, > > + alignof(struct SubProgram)); > > + if (buf == NULL) { > > + diag_set(OutOfMemory, nByte, > > + "region_aligned_alloc", > > + "buf"); > > + p->is_aborted = true; > > + return -1; > > + } > > + if (nSub > 0) > > + memcpy(buf, pSub->z, pSub->n); > > + buf[nSub++] = pOp->p4.pProgram; > > + mem_copy_bin(pSub, (char *)buf, nByte, > > + false); > > + region_truncate(&fiber()->gc, svp); > > } > > 29. What was wrong with keeping a 'grow' function? Such changes > only increase number of copies and pad out the code. > Rewrote this part. > > } > > } > > @@ -1402,41 +1408,39 @@ sqlVdbeList(Vdbe * p) > > mem_set_i64(pMem, pOp->p3); > > pMem++; > > > > - if (sqlVdbeMemClearAndResize(pMem, 256)) { > > - assert(p->db->mallocFailed); > > + size_t size = 256; > > + size_t svp = region_used(&fiber()->gc); > > + char *buf = (char *)region_alloc(&fiber()->gc, size); > > + if (buf == NULL) { > > + diag_set(OutOfMemory, size, "region_alloc", "buf"); > > + p->is_aborted = true; > > return -1; > > } > > - pMem->flags = MEM_Str | MEM_Term; > > - zP4 = displayP4(pOp, pMem->z, pMem->szMalloc); > > - > > - if (zP4 != pMem->z) { > > - pMem->n = 0; > > - sqlVdbeMemSetStr(pMem, zP4, -1, 1, 0); > > - } else { > > - assert(pMem->z != 0); > > - pMem->n = sqlStrlen30(pMem->z); > > - } > > + zP4 = displayP4(pOp, buf, size); > > + mem_copy_str(pMem, zP4, strlen(zP4), true); > > + region_truncate(&fiber()->gc, svp); > > pMem++; > > > > if (p->explain == 1) { > > - if (sqlVdbeMemClearAndResize(pMem, 4)) { > > - assert(p->db->mallocFailed); > > - return -1; > > - } > > - pMem->flags = MEM_Str | MEM_Term; > > - pMem->n = 2; > > - sql_snprintf(3, pMem->z, "%.2x", pOp->p5); /* P5 */ > > + char *str = (char *)tt_sprintf("%02hu", pOp->p5); > > + mem_copy_str(pMem, str, strlen(str), true); > > 30. Probably Timur didn't know, but tt_sprintf() also is static > alloc. I see no sense in avoiding it if you know what you are doing. > > The same below for 512 byte allocation and above for 256. > > If someone has issues with static alloc, I propose them to create a > ticket on making its API more safe. For example, introduce static_free() > and use it after each static_alloc() to prevent unintended overwrites. > Would be a good idea to increase the safety. > > > @@ -2418,80 +2422,72 @@ sqlBlobCompare(const Mem * pB1, const Mem * pB2) > > int > > sqlMemCompare(const Mem * pMem1, const Mem * pMem2, const struct coll * pColl) > > { > > - int f1, f2; > > - int combined_flags; > > - > > - f1 = pMem1->flags; > > - f2 = pMem2->flags; > > - combined_flags = f1 | f2; > > - > > /* If one value is NULL, it is less than the other. If both values > > * are NULL, return 0. > > */ > > - if (combined_flags & MEM_Null) { > > - return (f2 & MEM_Null) - (f1 & MEM_Null); > > - } > > + if (mem_is_null(pMem1) || mem_is_null(pMem2)) > > + return (int)mem_is_null(pMem2) - (int)mem_is_null(pMem1); > > > > - if ((combined_flags & MEM_Bool) != 0) { > > - if ((f1 & f2 & MEM_Bool) != 0) { > > + if (mem_is_bool(pMem1) || mem_is_bool(pMem2)) { > > + if (mem_is_bool(pMem1) && mem_is_bool(pMem2)) { > > 31. Issue with such changes is that the code is a part of internal > Mem's code. I can tell it from the code itsef (it works with > mems only), and from the name (sqlMemCompare). So it could afford > to touch internal members to achieve the best perf. I think it was > totally fine as long as it happened only inside of Mem API functions, > where you are supposed to know meaning of the fields. > > This probably arises from the code being a total mess when Mem > functions are defined all over the common files and don't have > their own place. Where you could define big internal functions in > mem.c, and do there whatever you want. > > The perf issues you are having might also be because of that. > > For instance, in this function 'combined_flags' allowed to do only > one branch per check. Now you do two. Was 'combined & type_flags' > became 'mem1 & type_flags || mem2 & type_flags'. `||` is a branch. > Rewrote this part. > > if (pMem1->u.b == pMem2->u.b) > > return 0; > > if (pMem1->u.b) > > return 1; > > return -1; > > } > > - if ((f2 & MEM_Bool) != 0) > > + if (mem_is_bool(pMem2)) > > return +1; > > return -1; > > } > > @@ -2613,7 +2609,12 @@ sqlVdbeCompareMsgpack(const char **key1, > > { > > const char *aKey1 = *key1; > > Mem *pKey2 = unpacked->aMem + key2_idx; > > - Mem mem1; > > + bool b; > > + uint64_t u; > > + int64_t i; > > + double r; > > + const char *s; > > + size_t size; > > 32. What was wrong with declaring these variables in-place? > Nothing. In new version this part is reverted, I believe. > > int rc = 0; > > switch (mp_typeof(*aKey1)) { > > default:{ > > diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c > > index 1101f7205..a9b552b9c 100644 > > --- a/src/box/sql/vdbemem.c > > +++ b/src/box/sql/vdbemem.c > > @@ -823,63 +750,6 @@ sqlVdbeMemSetZeroBlob(Mem * pMem, int n) > > pMem->z = 0; > > } > > > > -void > > -mem_set_bool(struct Mem *mem, bool value) > > -{ > > - sqlVdbeMemSetNull(mem); > > - mem->u.b = value; > > - mem->flags = MEM_Bool; > > - mem->field_type = FIELD_TYPE_BOOLEAN; > > -} > > - > > -void > > -mem_set_i64(struct Mem *mem, int64_t value) > > -{ > > - if (VdbeMemDynamic(mem)) > > - sqlVdbeMemSetNull(mem); > > - mem->u.i = value; > > - int flag = value < 0 ? MEM_Int : MEM_UInt; > > - MemSetTypeFlag(mem, flag); > > - mem->field_type = FIELD_TYPE_INTEGER; > > -} > > - > > -void > > -mem_set_u64(struct Mem *mem, uint64_t value) > > -{ > > - if (VdbeMemDynamic(mem)) > > - sqlVdbeMemSetNull(mem); > > - mem->u.u = value; > > - MemSetTypeFlag(mem, MEM_UInt); > > - mem->field_type = FIELD_TYPE_UNSIGNED; > > -} > > - > > -void > > -mem_set_int(struct Mem *mem, int64_t value, bool is_neg) > > -{ > > - if (VdbeMemDynamic(mem)) > > - sqlVdbeMemSetNull(mem); > > - if (is_neg) { > > - assert(value < 0); > > - mem->u.i = value; > > - MemSetTypeFlag(mem, MEM_Int); > > - } else { > > - mem->u.u = value; > > - MemSetTypeFlag(mem, MEM_UInt); > > - } > > - mem->field_type = FIELD_TYPE_INTEGER; > > -} > > - > > -void > > -mem_set_double(struct Mem *mem, double value) > > -{ > > - sqlVdbeMemSetNull(mem); > > - if (sqlIsNaN(value)) > > - return; > > - mem->u.r = value; > > - MemSetTypeFlag(mem, MEM_Real); > > - mem->field_type = FIELD_TYPE_DOUBLE; > > -} > > 33. So these functions were defined in C file, and there was no > perf issues with them. Doesn't it mean something became wrong just > now? > I believe that most part of perf changes happened due to mem_is_*() functions. Still, in new version I was told to not think too much about perf, at least for now. > > - > > /* > > * Return true if the Mem object contains a TEXT or BLOB that is > > * too large - whose size exceeds SQL_MAX_LENGTH. > Overall I think you need to create a plan how to solve the Mem > mess. > > A first step would be to locate all Mem API functions. It is not > clear now which of them are about Mem only, and which just happened > to have 'mem' in their name. And which don't have 'mem' in their name > but are a part of Mem API. > > Then I would propose to move all the mem code to its own couple of > files if possible. If not - break the dependencies of its functions > from the other subsystems and modules, and then move. > > Then use proper names for the existing functions. > > Then introduce new functions. > > This way it should be easier to see which functions are private (they > can touch flags, do combined_flags, do various hacks on Mems, but can't > be used out of mem.c and mem.h), and which are public (the other functions). > > This would do the real encapsulation. I believe I did this, however, I was not very strict with order. Maybe this should be fixed in next versions.