From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp10.mail.ru (smtp10.mail.ru [94.100.181.92]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 120DB445320 for ; Fri, 24 Jul 2020 20:20:04 +0300 (MSK) Date: Fri, 24 Jul 2020 20:20:03 +0300 From: Sergey Ostanevich Message-ID: <20200724172003.GF894@tarantool.org> References: <20200721113451.25817-1-skaplun@tarantool.org> <20200721113717.22804-1-skaplun@tarantool.org> <20200723101525.GC894@tarantool.org> <20200724111829.GA4086@root> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200724111829.GA4086@root> Subject: Re: [Tarantool-patches] [RFC] rfc: luajit metrics List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Hi! On 24 Jul 14:18, Sergey Kaplun wrote: > Hi! Thanks for the review! > > Mons makes no bones of mailing list so I cite his comments from > [1] here. > > On 23.07.20, Sergey Ostanevich wrote: > > Hi! > > > > Thanks for the patch! See my comments below. > > Otherwise LGTM. > > > > Sergos > > ... > > > + /* > > > + * Overall number of snap restores for all traces > > > > Snap restores needs some explanation. > > Ok, sounds reasonable. AFAIK number of snap restores equals to amount of > trace exits. May be this will be more informative for users. > Well. What this will tell to you as a user running a Lua code? Consider the number is grows - ??? And what if it shrinks?? I believe it is tighly bound to the total number of traces and the size of code. But what should it disclose to the user - how should it react, refactor its code? > > > + * "belonging" to the given jit_State since the last call > > > + * to luaM_metrics(). > > > + */ > > > + size_t jit_snap_restore; > > > + /* > > > + * Overall number of abort traces "belonging" to the given > > > + * jit_State since the last call to luaM_metrics(). > > > + */ > > > + size_t jit_trace_abort; > > > + /* Total size of all allocated machine code areas. */ > > > + size_t jit_mcode_size; > > > + /* Current amount of JIT traces. */ > > > + unsigned int jit_trace_num; > > > +}; > > > +``` > > > + > > > +One can see metrics are divided by the two types: global and incremental. > > > +Global metrics are collected throughout the platform uptime. Incremental > > > +metrics are reset after each `luaM_metrics()` call. > > As Mons said: > | Incremental metrics must not reset after each call (see practice from > | Linux kernel, for ex.). Rationale is simple: 2 different modules, both > | using this function, will ruin metrics to each other > > It's very important comment. With two modules that use metrics our > counters become just garbage. I suppose that we should use monotonic > increase counters for all "incremental" metrics. Users can count delta > between calls of `getmetrics` and analise a result. > > We can use uint64_t for storage of these metrics. If we astimate amount > of allocated objects as 40Gb per second it will be necessary 14 years of > instance uptime to overflow this counter. Also all logic corresponding > to overflow of the counter can be realized inside user code. > > > > + > > > +There is also a complement introduced for Lua space -- `misc.getmetrics()`. > > > +This function is just a wrapper for `luaM_metrics()` returning a Lua table with > > > +the similar metrics. Its usage is quite simple: > > > +``` > > > +$ ./src/tarantool > > > +Tarantool 2.5.0-267-gbf047ad44 > > > +type 'help' for interactive help > > > +tarantool> misc.getmetrics() > > > +--- > > > +- global: > > > + tabnum: 1812 > > > + gc_total: 1369927 > > > + strnum: 5767 > > > + jit_trace_num: 0 > > > + cdatanum: 89 > > > + jit_mcode_size: 0 > > > + udatanum: 17 > > > + incremental: > > > > incremental_from_last_call will be more descriptive? > > Mons: > | I'd consider rename global to absolute. All metrics are global (not > | scoped). But some are incremental counters and others are absolute > | current values. > > And also as far as we mustn't use updatable by call metrics, should we join all > metrics inside one table, shouldn't we? Sure, both comments are correct - we should report just absolute values, but mention this fact in documentation. > > > > + jit_snap_restore: 0 > > > + gc_freed: 2239391 > > > + strhash_hit: 53759 > > > + gc_steps_finalize: 0 > > > + gc_allocated: 3609318 > > > + gc_steps_atomic: 6 > > > + gc_steps_sweep: 296 > > > + gc_steps_sweepstring: 17920 > > > + jit_trace_abort: 0 > > > + strhash_miss: 6874 > > > > I wonder if those keys can be sorted? Consider, I want to find > > something by the name. Sorting helps greatly. > > It can be implement by user as simple as: > > | local metrics = xjit.getmetrics() > | local sorted = {} > | for k, v in pairs(metrics.incremental) do table.insert(sorted, k) end > | table.sort(sorted) > | for _, k in ipairs(sorted) do print(k, metrics.incremental[k]) end > | > | --[[ > | gc_allocated 8920 > | gc_freed 3528 > | gc_steps_atomic 0 > | gc_steps_finalize 0 > | gc_steps_pause 0 > | gc_steps_propagate 62 > | gc_steps_sweep 0 > | gc_steps_sweepstring 0 > | jit_snap_restore 0 > | jit_trace_abort 0 > | strhash_hit 118 > | strhash_miss 7 > | ]] > For me it is not that simple, frankly. Not in understanding - in use. Can we just incorporate the code into the getmetrics()? Thanks, Sergos > > > > + gc_steps_propagate: 10106 > > > + gc_steps_pause: 7 > > > +... > > > +``` > > > -- > > > 2.24.1 > > > > > Thoughts? > > -- > Best regards, > Sergey Kaplun