From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 6862F6EC5E; Mon, 30 Aug 2021 17:11:45 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 6862F6EC5E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1630332705; bh=K87gf0N0mNQL++yrF84/92ypgFDg99sA7Fcsj8tAPas=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=YSBmZNdqXjhDL8UMfyQ33J0DGkZJ4OlPTpFXfQrmmAO2SnAXjW1t4KTmL2qGXDQKW skb0HmdJlDxcp4PdhI4UUEQuSNBgMFOby7zwpYq8+dU3MDMjA763JLEJBicGLUBXfi pIxuNIy4d4Dr1X0Vus8U8zlpVLeeB3tbk2iAOdUk= Received: from smtp63.i.mail.ru (smtp63.i.mail.ru [217.69.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 86B2F6EC5E for ; Mon, 30 Aug 2021 17:11:44 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 86B2F6EC5E Received: by smtp63.i.mail.ru with esmtpa (envelope-from ) id 1mKi0l-0007w8-EQ; Mon, 30 Aug 2021 17:11:43 +0300 Date: Mon, 30 Aug 2021 17:10:22 +0300 To: Maxim Kokryashkin Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD92087353F0EC44DD910164DC12A5633065676A9727AC27C74182A05F5380850402EEC8B0B4986500D938DAFAF66F9DD8F196429EA1959754624254C4CCA35FC10 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE761966F250AC1AE21EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637BD378188104BC8BE8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D85E7BF011C1990E4326DBA078A453481A117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8B974A882099E279BDA471835C12D1D977C4224003CC8364762BB6847A3DEAEFB0F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7C089C7B220955F2D643847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: 0D63561A33F958A54C280D8CAC2ACE3E8346B2902AD725BE2DD57C368D000125D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA756888F6138A55F47E410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34ADE558D2B396DA7CA25C426D7D7E89C8D28EEDE4BA9D5488CABD803C85420D32F9B2897F554389B61D7E09C32AA3244C03C56E361251AFD18DCAE373FAF7945F3FD9C8CA1B0515E0927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj4DnN7V8kJ6uyY//Fn8i//g== X-Mailru-Sender: 3B9A0136629DC91206CBC582EFEF4CB41E4A2E65CC1224533696A8ED511BBC449CA59016CC677404F2400F607609286E924004A7DEC283833C7120B22964430C52B393F8C72A41A89437F6177E88F7363CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit v2 2/3] memprof: extend symtab with information about .so libs X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" | memprof: extend symtab with information about .so libs Width limit for commit title is 50 symbols including subsystem name [1]. Commit message should fit 72 symbols line width [1]. Also, you noticed [2], that just dump all symbols from so libraries shown better performance than the current solution. So looks like we want to reimplement this patch in this way, don't we? So, for now I stop reviewing (this patch is the last one) until we don't make the decision about implementation. On 20.08.21, Maxim Kokryashkin wrote: > This commit enriches memprof's symbol table with information about loaded > ".so" files. That information will provide demangling capabilities to > the parser. > > The following data is stored for each ".so" library: > | SYMTAB_SO | path to .so | it's address | CRC32 Hash | > 1 byte 8 bytes 8 bytes > magic > number > > C symbol address from profile events with an address of the ".so" file will > give us an offset, and it can be used to get the symbol in that file. > > CRC32 hash will be used by the parser to determine whether the shared > object that was used during the process profiling and during the process > of parsing is the same. > > Part of tarantool/tarantool#5813 > --- > src/lj_memprof.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++ > src/lj_memprof.h | 3 ++- > 2 files changed, 62 insertions(+), 1 deletion(-) > > diff --git a/src/lj_memprof.c b/src/lj_memprof.c > index 2c1ef3b8..c8367c95 100644 > --- a/src/lj_memprof.c > +++ b/src/lj_memprof.c > @@ -5,10 +5,15 @@ > ** Copyright (C) 2015-2019 IPONWEB Ltd. > */ > > +#define _GNU_SOURCE Should it be defind exactly behind ? And even undefined after? > + > #define lj_memprof_c > #define LUA_CORE > > #include > +#include > +#include Typo: this include looks redundant. > +#include > > #include "lj_arch.h" > #include "lj_memprof.h" > @@ -18,12 +23,64 @@ > #include "lj_obj.h" > #include "lj_frame.h" > #include "lj_debug.h" > +#include "lj_utils.h" > > /* --------------------------------- Symtab --------------------------------- */ > > static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION, > 0x0, 0x0, 0x0}; > > +int lj_file_crc32(const char* name, uint32_t* result) ^ Trailing whitespace here -------------------------------^ > +{ > +#define CRC_BUF_SIZE 1024 * 1024 Minor: It is better to use parentheses, when you define not one-token marco. Rationale: ~CRC_BUF_SIZE and ~(CRC_BUF_SIZE) return different values. (Of course, this example is a little bit crazy, but it shows the point). Also, why do we need 1 MB on the stack? I suppose that 512 bytes will be enough. > + > + char buf[CRC_BUF_SIZE] = ""; Don't get this line. Why do we need this initialization? > + size_t read_bytes = 0; This too. > + FILE* so = NULL; Typo: please use FILE *so instead FILE* so, considering our code style. > + > + if (result == NULL) { > + return 1; > + } > + > + *result = 0xffffffff; Please avoid the magic values in code. Also, I don't see any check for this value latter, so it looks excess. > + > + so = fopen(name, "rb"); > + if (!so) > + return 1; > + > + while (CRC_BUF_SIZE == (read_bytes = fread(buf, sizeof(*buf), CRC_BUF_SIZE, so))) { The brackets looks excess. Side note: why do you use "reversed order" i.e. constant == some_call()? > + *result = lj_utils_crc32(buf, CRC_BUF_SIZE, *result); ^ Trailing whitespace here -----------------------------------^ > + } > + ^^ Trailing whitespaces here. > + if(ferror(so) || !feof(so)) { > + fclose(so); > + return 1; > + } > + ^^ Trailing white spaces here. Side note: Please adjust your favorite editor to highlight trailing whytespaces. > + *result = lj_utils_crc32(buf, read_bytes, *result); > + fclose(so); > + return 0; > +} > + > +int write_shared_obj(struct dl_phdr_info *info, size_t size, void *data) I got the following warning during compilation: | /home/burii/reviews/luajit/csymbols/src/lj_memprof.c: In function 'write_shared_obj': | /home/burii/reviews/luajit/csymbols/src/lj_memprof.c:64:56: warning: unused parameter 'size' [-Wunused-parameter] | 64 | int write_shared_obj(struct dl_phdr_info *info, size_t size, void *data) Please, used the corresponding UNUSED() macro. > +{ > + uint32_t so_hash = 0; > + struct lj_wbuf *buf = data; > + > + if (info->dlpi_name && strlen(info->dlpi_name)) { > + if (0 != lj_file_crc32(info->dlpi_name, &so_hash)) > + // XXX: Maybe it is reasonable to print warning here I don't think so. This warning should be printed by the symtab parser. So we need to write this event, but with special hash value means "failed to calculate hash". OTOH, I don't know what we can do with hash collision. > + return 0; > + > + lj_wbuf_addbyte(buf, SYMTAB_SO); > + lj_wbuf_addstring(buf, info->dlpi_name); > + lj_wbuf_addu64(buf, info->dlpi_addr); > + lj_wbuf_addu64(buf, so_hash); > + } > + > + return 0; > +} > + > static void dump_symtab(struct lj_wbuf *out, const struct global_State *g) > { > const GCRef *iter = &g->gc.root; > @@ -49,6 +106,9 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g) > iter = &o->gch.nextgc; > } > > + /* Write shared libraries. */ > + dl_iterate_phdr(write_shared_obj, out); > + > lj_wbuf_addbyte(out, SYMTAB_FINAL); > } > > diff --git a/src/lj_memprof.h b/src/lj_memprof.h > index 3417475d..0cefc403 100644 > --- a/src/lj_memprof.h > +++ b/src/lj_memprof.h > @@ -16,7 +16,7 @@ > #include "lj_def.h" > #include "lj_wbuf.h" > > -#define LJS_CURRENT_VERSION 0x1 > +#define LJS_CURRENT_VERSION 0x2 It means that we need to upgrade parser with the same commit. Rationale: without this test will fail -- it complicates debugging with bisecting later (for now, tests fail on this commit). > > /* > ** symtab format: > @@ -51,6 +51,7 @@ > */ > > #define SYMTAB_LFUNC ((uint8_t)0) > +#define SYMTAB_SO ((uint8_t)1) Please, add entry for the corresponging events in symtab description. > #define SYMTAB_FINAL ((uint8_t)0x80) > > #define LJM_CURRENT_FORMAT_VERSION 0x01 > -- > 2.32.0 > Also, please add some tests to check the behaviour of your patch. [1]: https://github.com/tarantool/tarantool/wiki/Code-review-procedure#commit-message [2]: https://lists.tarantool.org/pipermail/tarantool-patches/2021-August/025707.html -- Best regards, Sergey Kaplun