[Tarantool-patches] [PATCH] say: fix format for fiber()->fid

Wed Feb 24 11:28:55 MSK 2021

On Tue, Feb 23, 2021 at 11:47:44PM +0100, Vladislav Shpilevoy wrote:
> >>> diff --git a/src/lib/core/say.c b/src/lib/core/say.c
> >>> index cbd10e107..6138752a7 100644
> >>> --- a/src/lib/core/say.c
> >>> +++ b/src/lib/core/say.c
> >>> @@ -792,7 +792,7 @@ say_format_plain_tail(char *buf, int len, int level, const char *filename,
> >>>  	if (cord) {
> >>>  		SNPRINT(total, snprintf, buf, len, " %s", cord->name);
> >>>  		if (fiber() && fiber()->fid != FIBER_ID_SCHED) {
> >>> -			SNPRINT(total, snprintf, buf, len, "/%i/%s",
> >>> +			SNPRINT(total, snprintf, buf, len, "/%u/%s",
> >>>  				fiber()->fid, fiber_name(fiber()));
> >>
> >> I remember we had some issues about %u not being compatible with uint32_t so
> >> we did manual casts to 'unsigned'. Not sure though what was the commit/ticket,
> >> or if it really happened.
> > 
> > I suspect you rather mean uint64_t, which indeed better convert to
> > excplicit "long long".
> 
> No, I mean exactly uint32_t. You don't have a guarantee that it is
> the same as 'unsigned' AFAIK.

It is, on x86. And I really doubt if we ever meet an arch where int
is not 4 bytes long. Though in embedded world this might be a common
situation (but if we aim ourself to run tarantool here the output
issue gonna be a least problem).

> 
> > Strictly speaking if we really need to cover
> > compatibility we would _had_ to use PRIx macros which are so ugly :/
> 
> This is why I want us to use casts to 'unsigned' when we use %u. They
> are less ugly.

We will have to cover a bunch of already existing code for nothing :(
But OK, I'll add explicit cast.

> 
> > So no, uint32_t is known to be safe with %u format.
> 
> Is it written in the standard somewhere? I honestly don't know.

Standart for builtin types doesn't cover the size of machine words
as far as I remember because this would tie a hands for compiler developers
which have to deal with custom archs. It specifies the integer classes
covering and on x86 we know that we're safe. But as I said I'll add
explicit cast.

> > Side note: I found that we still use %u for some of uint64_t,
> > in particuar
> > 
> > 	if (req->term < raft->volatile_term) {
> > 		say_info("RAFT: the message is ignored due to outdated term - "
> > 			 "current term is %u", raft->volatile_term);
> > 		return 0;
> > 	}
> > 
> > probably should use %llu and explicit "long long" for long standing
> > servers where raft term would overflow 32 bit value.
> 
> Yes, looks like a bug.

Should not be critical, though. It is logging for debug mostly but we
should fix it.

> But a more appropriate fix would be to investigate why don't we have
> compile time checks for this. The bug in raft's say_info() is clearly
> an error and should have been alarmed at compilation.

We have kind of check

/** Internal function used to implement say() macros */
CFORMAT(printf, 5, 0) extern sayfunc_t _say;

still I must confess I don't know yet why it didn't trigger, need
to investigate.

> At least clang supports an attribute which you can set for a function
> and get printf-like checks and warnings in case of format issues.
> 
> We should adopt this: https://clang.llvm.org/docs/AttributeReference.html#id278
> 
> Somewhy I thought we already use it, but now I don't see any usages.
> It is easy to add, and helps to catch some weird stuff.
> 
> Lets do it in scope of this patch series to fix the issue entirely. Not
> just in one exploded place.

OK

> 
> >> I am fine with not using the cast as long as all platforms and compilers
> >> build flawlessly. Also it is ok for me if you will add an explicit cast in
> >> the next version of the patch. Up to you. The same below.
> > 
> > No, Vlad, I think we don't need explicit casts for 32bit values.
> 
> Do you have a proof for this? That it is safe. Where in the standard do they
> say that '%u' is exactly 4 bytes?

Already explained. I'll make it use explicit cast though I still
think it is redundant.

> > I pushed
> > an update to the branch gorcunov/gh-5846-fid-name and here is a new
> > changelog. Does it look better?
> 
> It seems you didn't really read my comment, did you? I said:
> 
> 	"Please, add a changelog file"
> 
> I didn't ask for more info in the commit message. This particular
> bug in this particular line was obvious.
> 
> But you didn't add a changelog **file**. It should be added here:
> https://github.com/tarantool/tarantool/tree/master/changelogs/unreleased

Ah, you mean this new tecique for changelogs. OK, misread your comment,
at first I thought to update commit message for better export into
changelog. Will do.

> 
> After more consideration I also realized we should not use uint32_t
> for fiber ids. It is monotonically growing. Which means it will overflow
> in the foreseeable future on a long-running instance.

 - it is explected to be overlowable from the beginning
 - the IDs are recycled

I've been talking to Kostya about a year ago once I discovered that
we're walking in cycle when reusing IDs. He assured me that this is
fine and to get a problem we have to allocate 4294967195 fibers at
once which will require ~1.5T of memory just to allocate fiber structure.
Hardy to happen. Still I think we at least should add a painc in case of
hash collision for future.

> For example, assume one fiber is started and recycled each millisecond
> (which is quite a small rate). Then uint32 will overflow in less than
> 50 days. Considering we have instances running for years (at least from
> what I heard from Mons), this is clearly a bug.
> 
> On the summary, it seems we need to
> 
> - Add compile-time format checks using clang __format__ attribute;
> - Fix all the wrong formats;

+1

> - Use 64bit ids for fibers. Unless I am wrong somewhere in my thoughts
>   above.

This is dubious. Personally I would prefer to have them as uint64_t too
now and for all, to simply forget about this potential issue. +4 bytes
to fiber structure seems to be acceptable trade off.

	Cyrill