From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp44.i.mail.ru (smtp44.i.mail.ru [94.100.177.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 0695242EF5C for ; Tue, 23 Jun 2020 23:32:11 +0300 (MSK) Received: by smtp44.i.mail.ru with esmtpa (envelope-from ) id 1jnpaV-0003Ni-92 for tarantool-patches@dev.tarantool.org; Tue, 23 Jun 2020 23:32:11 +0300 Received: by mail-lf1-f51.google.com with SMTP id t74so71761lff.2 for ; Tue, 23 Jun 2020 13:32:11 -0700 (PDT) MIME-Version: 1.0 References: <20200611002510.35349-1-huston.mavr@gmail.com> <7855a532-9877-3fef-4a52-c480b4509e4a@tarantool.org> <2e9c5d4a-4af1-ccb5-ef1d-4e245e62b8b7@tarantool.org> <676d7330-579f-7950-896e-ae2b3bf7df9a@tarantool.org> In-Reply-To: <676d7330-579f-7950-896e-ae2b3bf7df9a@tarantool.org> From: Yaroslav Dynnikov Date: Tue, 23 Jun 2020 23:31:59 +0300 Message-ID: Content-Type: multipart/alternative; boundary="000000000000fe457f05a8c640a3" Subject: Re: [Tarantool-patches] [PATCH] cmake: cleanup src/CMakeLists.txt List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org, Alexander Turenko --000000000000fe457f05a8c640a3 Content-Type: text/plain; charset="UTF-8" Hi, Vlad. Sorry for the late answer. I was going to recheck ideas you've proposed, but didn't have time for that. So my answer is based on my knowledge only. On Sat, 20 Jun 2020, 02:39 Vladislav Shpilevoy, wrote: > Hi! Thanks for the investigation! > > On 19/06/2020 15:02, Yaroslav Dynnikov wrote: > > I've researched the linking process and here is what I've found. > > > > First of all, let's talk about symbols removal. Here is an interesting > note on > > how it works: https://stackoverflow.com/questions/55130965/when-drop > unuseddrop unusedand-why-would-the-c-linker-exclude-unused-symbols > > > One of the basic important concepts is an ELF section. It's a chunk of > data > > (assembly code in our case) that linker operates on. A section may > contain code > > of one or more functions (depending on build arguments), but it can't be > split > > during linking. > > > > Usually, when we build an object file with `gcc libx.c -c -o libx.o`, all > > functions from libx.c go into the single ".text" section. This single > object > > file is usually archived with others into single libx.a, which is later > > processed during main executable linking. > > > > By default (but I can't provide the proof), when we call `gcc -lx` linker > > operates on object files from the archive - if at least one symbol is > used, > > whole .o (not .a) is included in the resulting binary. There are also > two linker > > options that influence this behavior. At first, there is > `-Wl,--whole-archive`, > > which makes it to include whole `.a` instead of `.o` granularity. > Secondly, there is > > `-Wl,--gc-sections` which could remove unused sections, but in basic > example > > it's useless since all symbols from .o belong to the same .text section. > To make > > `--gc-sections` have an effect one should compile object files with > > `-ffunction-sections` flag. It'll generate a separate section for every > function > > so the linker could gc unused ones. > > > > See: > > ```console$ cat libx.c > > #include > > > > void fA() { > > printf("fA is here\n"); > > } > > void fB() { > > printf("fB is here\n"); > > } > > $ gcc libx.c -c -o libx.o -ffunction-sections > > $ readelf -S libx.o | grep .text > > [ 1] .text PROGBITS 0000000000000000 00000040 > > [ 5] .text.fA PROGBITS 0000000000000000 00000056 > > [ 6] .rela.text.fA RELA 0000000000000000 000002d8 > > [ 7] .text.fB PROGBITS 0000000000000000 0000006d > > [ 8] .rela.text.fB RELA 0000000000000000 00000308 > > ``` > > > > Now let's move to the `libbit` which Vlad mentioned. > > I've investigated how compiler options influence the resulting binary. > Unused > > functions from bit.c are really remover, but only with Release flags, > and here > > is why: > > > > There are only 2 functions implemented in bit.c, and both are unused. > All the > > others are inlines in bit.h and externs from luajit. When tarantool is > built in > > debug mode, the inlining is off, so other modules truly link to the > bit.o and > > all symbols remain including unused functions. But if we specify -O2 > flag, > > inlining takes place, and all the symbols from bit.o becomes unused, so > the > > linker drops the whole object file. > > So do you mean, that if a library consists of more than 1 C file (and > builds > more than one .o file), and functions from some of them are not used, these > .o files and their code will be removed? > Exactly. Moreover, it depends on the gcc command syntax. Suppose we have libx.a with two object files: fA.o and fB.o. Function from fA is used, and fB isn't. Then `gcc main.c libx.a` will produce binary with both fA and fB. While `gcc main.c -lx` will include fA only, and unused fB will be dropped. It's also mentioned in the SO question I linked in the previous message. > If it is true, it does not look like green light for the patch and requires > more experiments. For example, try to add a new library with several C > files, > use only some of them in the Tarantool executable (just add to exports.h), > and see if code from unused .o files is removed even when they are built > into > .a file. > I guess the solution will be adding --whole-archive flag. But i'm not sure yet. > > However, as I said in private - I am ok with pushing this patch now. I just > don't see why is it necessary. Why is it so important to push it? Push just > for push? Just to close an issue? It does not improve anything, EXPORT_LIST > still may appear to be useful along with some other things I removed in > 2971, > when I didn't think of the static build. > You're right, it's not that important. I've already told Alex Barulev that we'll postpone this cleanup until we finish with static build refactoring. > > > Finally, speaking about this patch, my proposal is to merge this PR as > is. > > And since we know how to manage linking, other problems can be solved > separately > > (if they ever occur). > > > > Best regards > > Yaroslav Dynnikov > --000000000000fe457f05a8c640a3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi, Vlad.

= Sorry for the late answer. I was going to recheck ideas you've proposed= , but didn't have time for that.
So my answer is based on my= knowledge only.

On Sat, 20 Jun 2020, 02:39 Vladislav Shpilevoy, &l= t;v.shpilevo= y@tarantool.org> wrote:
Hi! = Thanks for the investigation!

On 19/06/2020 15:02, Yaroslav Dynnikov wrote:
> I've researched the linking process and here is what I've foun= d.
>
> First of all, let's talk about symbols removal. Here is an interes= ting note on
> how it works: https://stackoverflow.com/questions/55130965/w= hen-drop unuseddrop unusedand-why-would-the-c-linker-exclude-unused-symbols=
> One of the basic important concepts is an ELF section. It's a chun= k of data
> (assembly code in our case) that linker operates on. A section may con= tain code
> of one or more functions (depending on build arguments), but it can= 9;t be split
> during linking.
>
> Usually, when we build an object file with `gcc libx.c -c -o libx.o`, = all
> functions from libx.c go into the single ".text" section. Th= is single object
> file is usually archived with others into single libx.a, which is late= r
> processed during main executable linking.
>
> By default (but I can't provide the proof), when we call `gcc -lx`= linker
> operates on object files from the archive - if at least one symbol is = used,
> whole .o (not .a) is included in the resulting binary. There are also = two linker
> options that influence this behavior. At first, there is `-Wl,--whole-= archive`,
> which makes it to include whole `.a` instead of `.o` granularity. Seco= ndly, there is
> `-Wl,--gc-sections` which could remove unused sections, but in basic e= xample
> it's useless since all symbols from .o belong to the same .text se= ction. To make
> `--gc-sections` have an effect one should compile object files with > `-ffunction-sections` flag. It'll generate a separate section for = every function
> so the linker could gc unused ones.
>
> See:
> ```console$ cat libx.c
> #include <stdio.h>
>
> void fA() {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 printf("fA is here\n");
> }
> void fB() {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 printf("fB is here\n");
> }
> $ gcc libx.c -c -o libx.o -ffunction-sections
> $ readelf -S libx.o | grep .text
> =C2=A0 [ 1] .text =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PROGBITS = =C2=A0 =C2=A0 =C2=A0 =C2=A0 0000000000000000 =C2=A000000040
> =C2=A0 [ 5] .text.fA =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PROGBITS =C2=A0= =C2=A0 =C2=A0 =C2=A0 0000000000000000 =C2=A000000056
> =C2=A0 [ 6] .rela.text.fA =C2=A0 =C2=A0 RELA =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 0000000000000000 =C2=A0000002d8
> =C2=A0 [ 7] .text.fB =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PROGBITS =C2=A0= =C2=A0 =C2=A0 =C2=A0 0000000000000000 =C2=A00000006d
> =C2=A0 [ 8] .rela.text.fB =C2=A0 =C2=A0 RELA =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 0000000000000000 =C2=A000000308
> ```
>
> Now let's move to the `libbit` which Vlad mentioned.
> I've investigated how compiler options influence the resulting bin= ary. Unused
> functions from bit.c are really remover, but only with Release flags, = and here
> is why:
>
> There are only 2 functions implemented in bit.c, and both are unused. = All the
> others are inlines in bit.h and externs from luajit. When tarantool is= built in
> debug mode, the inlining is off, so other modules truly link to the bi= t.o and
> all symbols remain including unused functions. But if we specify -O2 f= lag,
> inlining takes place, and all the symbols from bit.o becomes unused, s= o the
> linker drops the whole object file.

So do you mean, that if a library consists of more than 1 C file (and build= s
more than one .o file), and functions from some of them are not used, these=
.o files and their code will be removed?

Exactly. Moreover, it depends on th= e gcc command syntax.
Suppose we have libx.a with two object file= s: fA.o and fB.o. Function from fA is used, and fB isn't.
The= n `gcc main.c libx.a` will produce binary with both fA and fB.
Wh= ile `gcc main.c -lx` will include fA only, and unused fB will be dropped.
It's also mentioned in the
--000000000000fe457f05a8c640a3--