From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp20.mail.ru (smtp20.mail.ru [94.100.179.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 727FA42EF5C for ; Fri, 19 Jun 2020 16:03:01 +0300 (MSK) Received: by smtp20.mail.ru with esmtpa (envelope-from ) id 1jmGfc-0006Mr-RC for tarantool-patches@dev.tarantool.org; Fri, 19 Jun 2020 16:03:01 +0300 Received: by mail-lf1-f44.google.com with SMTP id d21so3420666lfb.6 for ; Fri, 19 Jun 2020 06:03:00 -0700 (PDT) MIME-Version: 1.0 References: <20200611002510.35349-1-huston.mavr@gmail.com> <7855a532-9877-3fef-4a52-c480b4509e4a@tarantool.org> <2e9c5d4a-4af1-ccb5-ef1d-4e245e62b8b7@tarantool.org> In-Reply-To: <2e9c5d4a-4af1-ccb5-ef1d-4e245e62b8b7@tarantool.org> From: Yaroslav Dynnikov Date: Fri, 19 Jun 2020 16:02:48 +0300 Message-ID: Content-Type: multipart/alternative; boundary="0000000000003d93c905a86f831e" Subject: Re: [Tarantool-patches] [PATCH] cmake: cleanup src/CMakeLists.txt List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org, Alexander Turenko --0000000000003d93c905a86f831e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I've researched the linking process and here is what I've found. First of all, let's talk about symbols removal. Here is an interesting note on how it works: https://stackoverflow.com/questions/55130965/when-and-why-would-the-c-linke= r-exclude-unused-symbols One of the basic important concepts is an ELF section. It's a chunk of data (assembly code in our case) that linker operates on. A section may contain code of one or more functions (depending on build arguments), but it can't be split during linking. Usually, when we build an object file with `gcc libx.c -c -o libx.o`, all functions from libx.c go into the single ".text" section. This single objec= t file is usually archived with others into single libx.a, which is later processed during main executable linking. By default (but I can't provide the proof), when we call `gcc -lx` linker operates on object files from the archive - if at least one symbol is used, whole .o (not .a) is included in the resulting binary. There are also two linker options that influence this behavior. At first, there is `-Wl,--whole-archive`, which makes it to include whole `.a` instead of `.o` granularity. Secondly, there is `-Wl,--gc-sections` which could remove unused sections, but in basic exampl= e it's useless since all symbols from .o belong to the same .text section. To make `--gc-sections` have an effect one should compile object files with `-ffunction-sections` flag. It'll generate a separate section for every function so the linker could gc unused ones. See: ```console$ cat libx.c #include void fA() { printf("fA is here\n"); } void fB() { printf("fB is here\n"); } $ gcc libx.c -c -o libx.o -ffunction-sections $ readelf -S libx.o | grep .text [ 1] .text PROGBITS 0000000000000000 00000040 [ 5] .text.fA PROGBITS 0000000000000000 00000056 [ 6] .rela.text.fA RELA 0000000000000000 000002d8 [ 7] .text.fB PROGBITS 0000000000000000 0000006d [ 8] .rela.text.fB RELA 0000000000000000 00000308 ``` Now let's move to the `libbit` which Vlad mentioned. I've investigated how compiler options influence the resulting binary. Unused functions from bit.c are really remover, but only with Release flags, and here is why: There are only 2 functions implemented in bit.c, and both are unused. All the others are inlines in bit.h and externs from luajit. When tarantool is built in debug mode, the inlining is off, so other modules truly link to the bit.o and all symbols remain including unused functions. But if we specify -O2 flag, inlining takes place, and all the symbols from bit.o becomes unused, so the linker drops the whole object file. Finally, speaking about this patch, my proposal is to merge this PR as is. And since we know how to manage linking, other problems can be solved separately (if they ever occur). Best regards Yaroslav Dynnikov On Thu, 18 Jun 2020 at 02:09, Vladislav Shpilevoy wrote: > On 17/06/2020 17:29, Mavr Huston wrote: > > > > EXPORT_LIST contains following libraries in case of static build (with > normal build it's empty): > > > > /usr/lib/x86_64-linux-gnu/libreadline.so > > > > /usr/lib/x86_64-linux-gnu/libcurses.so > > > > /usr/lib/x86_64-linux-gnu/libform.so > > > > /usr/lib/x86_64-linux-gnu/libtinfo.so > > > > /usr/lib/x86_64-linux-gnu/libz.so > > > > /opt/local/lib/libssl.so > > > > /opt/local/lib/libcrypto.so > > > > /usr/lib/x86_64-linux-gnu/libz.so > > > > /opt/local/lib/libicui18n.so > > > > /opt/local/lib/libicuuc.so > > > > /opt/local/lib/libicudata.so > > > > > > It doesn=E2=80=99t contains libcurl because it=E2=80=99s bundled static= ally. So it isn=E2=80=99t > related to https://github.com/tarantool/tarantool/issues/4559but this > problem may be solved with next patch: > https://github.com/tarantool/tarantool/tree/rosik/refactor-static-build. > At this patch added flag --disble-symbos-hiding ( > https://github.com/tarantool/tarantool/blob/rosik/refactor-static-build/c= make/BuildLibCURL.cmake#L93) > at building libcurl and after that most of libcurl symbols are visible fr= om > tarantool binary > > The problem is not in the hiding. It is about removal. Not used symbols > from > static libs may be removed from the final executable. Hide or not hide > rules > are applied to what is left. That is the single reason why we had exports > file > before 2971 and have exports.h and exports.c now. You can try it by > yourself - > just add an unused function to lib/bit to bit.h and bit.c. And don't use = it > anywhere. You may even add 'export' to it, or change visibility rules usi= ng > __attribute__ - it does not matter. If the function is not used and is no= t > added to exports.h, you won't see it in the executable. (At least it was = so > last time I tried, works not with all libs, but with lib/bit it worked). > > Seems EXPORT_LIST was used to extract all symbols from the static libs an= d > force their exposure + forbid their removal. Here the symbols were > retrieved: > > https://github.com/tarantool/tarantool/commit/03790ac5510648d1d9648bb2281= 857a7992d0593#diff-6b9c867c54f1a1b792de45d5262f1dcfL20-L25 > > Here the libs were passed to mkexports: > > https://github.com/tarantool/tarantool/commit/03790ac5510648d1d9648bb2281= 857a7992d0593#diff-95e351a3805a1dafa85bf20b81d086e6L253-L260 > > We probably should resurrect that part. Rename the current exports.h to > exports.h.in > and generate exports.h during cmake. Like it was done for exports before > 2971. To > forbid symbols removal. Not to unhide them. > --0000000000003d93c905a86f831e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I've researched the linking process and here is what I= 've found.

First of all, let's talk about symbols removal. H= ere is an interesting note on
how it works: https://stackoverflow.com/questions/55130965/when-and-why-would-th= e-c-linker-exclude-unused-symbols
One of the basic important concept= s is an ELF section. It's a chunk of data
(assembly code in our case= ) that linker operates on. A section may contain code
of one or more fun= ctions (depending on build arguments), but it can't be split
during = linking.

Usually, when we build an object file with `gcc libx.c -c -= o libx.o`, all
functions from libx.c go into the single ".text"= ; section. This single object
file is usually archived with others into = single libx.a, which is later
processed during main executable linking.<= br>
By default (but I can't provide the proof), when we call `gcc -l= x` linker
operates on object files from the archive - if at least one sy= mbol is used,
whole .o (not .a) is included in the resulting binary. The= re are also two linker
options that influence this behavior. At first, t= here is `-Wl,--whole-archive`,
which makes it to include whole `.a` inst= ead of `.o` granularity. Secondly, there is
`-Wl,--gc-sections` which co= uld remove unused sections, but in basic example
it's useless since = all symbols from .o belong to the same .text section. To make
`--gc-sect= ions` have an effect one should compile object files with
`-ffunction-se= ctions` flag. It'll generate a separate section for every function
s= o the linker could gc unused ones.

See:
```console$ cat libx.c #include <stdio.h>

void fA() {
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 printf("fA is here\n");
}
void fB() {
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 printf("fB is here\n");
}
$ gcc libx.c -c -o = libx.o -ffunction-sections
$ readelf -S libx.o | grep .text
=C2=A0 [ = 1] .text =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PROGBITS =C2=A0 =C2=A0 = =C2=A0 =C2=A0 0000000000000000 =C2=A000000040
=C2=A0 [ 5] .text.fA =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PROGBITS =C2=A0 =C2=A0 =C2=A0 =C2=A0 0000000= 000000000 =C2=A000000056
=C2=A0 [ 6] .rela.text.fA =C2=A0 =C2=A0 RELA = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0000000000000000 =C2=A0000002d8=C2=A0 [ 7] .text.fB =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PROGBITS =C2=A0 = =C2=A0 =C2=A0 =C2=A0 0000000000000000 =C2=A00000006d
=C2=A0 [ 8] .rela.t= ext.fB =C2=A0 =C2=A0 RELA =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0000000= 000000000 =C2=A000000308
```

Now let's move to the `libbit` w= hich Vlad mentioned.
I've investigated how compiler options influenc= e the resulting binary. Unused
functions from bit.c are really remover, = but only with Release flags, and here
is why:

There are only 2 fu= nctions implemented in bit.c, and both are unused. All the
others are in= lines in bit.h and externs from luajit. When tarantool is built in
debug= mode, the inlining is off, so other modules truly link to the bit.o andall symbols remain including unused functions. But if we specify -O2 flag,=
inlining takes place, and all the symbols from bit.o becomes unused, so= the
linker drops the whole object file.

Finally, speaking about = this patch, my proposal is to merge this PR as is.
And since we know how= to manage linking, other problems can be solved separately
(if they eve= r occur).

Best regards
Yaroslav Dynnikov

On Thu, 18 Jun 2020 = at 02:09, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
On 17/06/2020 17:29, Mavr Hu= ston wrote:
>
> EXPORT_LIST contains following libraries in case of static build (with= normal build it's empty):
>
> /usr/lib/x86_64-linux-gnu/libreadline.so
>
> /usr/lib/x86_64-linux-gnu/libcurses.so
>
> /usr/lib/x86_64-linux-gnu/libform.so
>
> /usr/lib/x86_64-linux-gnu/libtinfo.so
>
> /usr/lib/x86_64-linux-gnu/libz.so
>
> /opt/local/lib/libssl.so
>
> /opt/local/lib/libcrypto.so
>
> /usr/lib/x86_64-linux-gnu/libz.so
>
> /opt/local/lib/libicui18n.so
>
> /opt/local/lib/libicuuc.so
>
> /opt/local/lib/libicudata.so
>
>
> It doesn=E2=80=99t contains libcurl because it=E2=80=99s bundled stati= cally. So it isn=E2=80=99t related to https://g= ithub.com/tarantool/tarantool/issues/4559but this problem may be solved= with next patch: https://git= hub.com/tarantool/tarantool/tree/rosik/refactor-static-build. At this p= atch added flag --disble-symbos-hiding (https://github.com/tarantool/tarant= ool/blob/rosik/refactor-static-build/cmake/BuildLibCURL.cmake#L93) at b= uilding libcurl and after that most of libcurl symbols are visible from tar= antool binary

The problem is not in the hiding. It is about removal. Not used symbols fro= m
static libs may be removed from the final executable. Hide or not hide rule= s
are applied to what is left. That is the single reason why we had exports f= ile
before 2971 and have exports.h and exports.c now. You can try it by yoursel= f -
just add an unused function to lib/bit to bit.h and bit.c. And don't us= e it
anywhere. You may even add 'export' to it, or change visibility rul= es using
__attribute__ - it does not matter. If the function is not used and is not<= br> added to exports.h, you won't see it in the executable. (At least it wa= s so
last time I tried, works not with all libs, but with lib/bit it worked).
Seems EXPORT_LIST was used to extract all symbols from the static libs and<= br> force their exposure + forbid their removal. Here the symbols were retrieve= d:
https://github.com/tarantool/tarantool/commi= t/03790ac5510648d1d9648bb2281857a7992d0593#diff-6b9c867c54f1a1b792de45d5262= f1dcfL20-L25

Here the libs were passed to mkexports:
https://github.com/tarantool/tarantool/co= mmit/03790ac5510648d1d9648bb2281857a7992d0593#diff-95e351a3805a1dafa85bf20b= 81d086e6L253-L260

We probably should resurrect that part. Rename the current exports.h to exports.h= .in
and generate exports.h during cmake. Like it was done for exports before 29= 71. To
forbid symbols removal. Not to unhide them.
--0000000000003d93c905a86f831e--