[Tarantool-discussions] Consider exporting symbols from libraries: small, msgpuck
Alexander Turenko
alexander.turenko at tarantool.org
Mon Sep 7 01:42:15 MSK 2020
I was accumulating thoughts around ABI compatibility for myself during
some time and want to share them. The main question that I bring into
attention here: whether it worth to expose msgpuck, small and other
libraries APIs into tarantool's module API.
Problem
-------
A tarantool module (say, memcached) uses a library, which is also used in
tarantool (say, small). Let's assume that tarantool and the module use
different versions of the library. Say, a layout of some structure was
changed: a non-last field was removed or a field was added to the middle.
| tarantool executable
| --------------------
|
| /* foo.h */
|
| struct foo {
| uint64_t bar;
| uint64_t baz;
| struct foo *next;
| }
|
| void
| foo_create(struct foo *foo, struct foo *next);
|
| /* foo.c */
|
| void
| foo_create(struct foo *foo, struct foo *next)
| {
| foo->bar = 0;
| foo->baz = 0;
| foo->next = next;
| }
| module dynamic library
| ----------------------
|
| /* foo.h */
|
| struct foo {
| /* !! no bar !! */
| uint64_t baz;
| struct foo *next;
| }
|
| void
| foo_create(struct foo *foo, struct foo *next);
|
| /* foo.c */
|
| void
| foo_create(struct foo *foo, struct foo *next)
| {
| /* !! no foo->bar = 0 !! */
| foo->baz = 0;
| foo->next = next;
| }
Let's look how a breakage may occur.
After unhiding internal symbols in tarantool executable (see [1]), a call of
foo_create() from the module will actually call the function from tarantool
executable, which will set foo->next to NULL (`foo->baz = 0;`) and will access
a memory out of the structure bounds (`foo->next = next;`).
Note for myself: I would take extra care to inline functions in public
headers, however I have no example of a possible breakage in the mind.
Noted here to think around it later.
Note: Some msgpuck symbols were exposed even before [1]. I guess it was to use
them using LuaJIT FFI.
[1]: https://github.com/tarantool/tarantool/issues/2971
Background
----------
- Default on Linux: use a symbol from executable file.
- MacOS behaviour is like RTLD_DEEPBIND is used (from Vlad Sh.)
- See dlopen(3): RTLD_DEEPBIND (place a symbol from a library before global
one in the lookup order).
Known cases
-----------
- LTO and ASAN complains about this.
https://github.com/tarantool/tarantool/issues/5001
LTO fix: https://github.com/tarantool/tarantool/commit/36927e540549fbdfd156ac3518616dbf4642711f
ASAN fix: https://github.com/tarantool/tarantool/commit/e8c72d4fe66ea94e357af2e527cb5cc4727f09da
- memcached fails on some tarantool versions.
This case is almost same as the abstract one described above: the symbol
unhiding patch leads to the breakage.
https://github.com/tarantool/memcached/issues/59
- box_txn_alloc() changes its behaviour.
Not strictly related to the problem described above, but it is another
tarantool public C API breakage. So it is related to the question below: how
to test the API to prevent this kind of breakage.
https://github.com/tarantool/memcached/issues/53
My questions
------------
- Should not we expose small, msgpuck libraries symbols from tarantool
executable and ship corresponsing header files?
- How to ensure that exposed API / ABI is stable: one may use old headers to
compile, but symbols from newer executable at runtime.
- Of course, we should test tarantool changes against external modules. But
it is not general ABI compatibility verification: some cases may not be
covered by a module test, there may be closed-source modules.
- How existing ABI compatibility checkers are? Say, [2].
- Looks promising: at least the description suggests that the case above
would be catched ('renamed fields').
- We should define rules how to change public API structures and functions.
Existing of such checklists makes life easier.
- Many points should be here, but I'll highlight one that comes into my mind
(just to don't forgot about it): we possibly will need to use padding at
end of public structures to have ability to extend it. Or explicitly state
that a structure is not known at build time, so it may not be used in
arrays or allocated on a stack. If there is no need to provide direct
access to first N fields (say, due to performance matters), we can just
make it opaque.
- Can we just ship small / msgpuck header files and expose its symbols from
tarantool? Or we need a separate public API layer?
- The former would obligate us to keep those libraries ABI compatible.
- The latter don't: this way the library should only be used as static one.
- How about performance? Whether building a module with a library (like small
or msgpuck) directly (not using of tarantool's one) may give better
performance because of using inline functions and macros?
- Can we make bundling a library into a module safe using symbol renaming
(say, some macro magic)?
- For particular case: using of fiber()->gc: can we expose some reduced API
from tarantool and be happy?
[2]: https://lvc.github.io/abi-compliance-checker/
Why I started the discussion?
-----------------------------
I want to implement Lua API for key_def as an external module, which should be
based on a public C API (which in turn should be extended for this matter).
The built-in key_def Lua module uses fiber->gc region; region functions are
part of the small library.
Considering version mismatch problems we already met in the past I would
prefer to expose small library symbols from tarantool executable and use them
in the module.
I found that just exposing relevant symbols does not shield us from ABI
breakage problems, so the questions above should be resolved (sooner or
later).
Mea culpa
---------
Well, I should google for 'how to write abi compatible libraries', read some
articles and I guess most of my questions will gone. I wrote the letter above
just to formalize things for myself, but than found that it may be used as the
base for further discussions.
Forward ABI compatibility guidelines
------------------------------------
This sections is added later, so it may contradict with something written
above.
Excerpts of useful info from different sources.
- https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html
- symbol versioning
- when exactly it is needed? what a problem it solves?
- policy: don't change anything, only add
- separation of interface and implementation
- how about macroses, which wraps sizeof() / alignof() calls?
- testing
- `make check-abi`
- It just check all symbols using sizeof(), alignof() and so. I guess
also check list of symbols and each structure field.
- Files (from gcc tarball):
- libstdc++-v3/testsuite/Makefile.in
- libstdc++-v3/testsuite/util/testsuite_abi_check.cc
- libstdc++-v3/testsuite/util/testsuite_abi.{h,cc}
- libstdc++-v3/libsupc++/cxxabi.h
- <...>
- `make check-c++` just runs the C standard library test suite. The idea
of ABI compatibility check is to run a testsuite from one version
against another one.
- http://abicheck.sourceforge.net/
- It is linked from the page. Why? Is it used in GCC? Is it related to
`make check-abi`? Is it just recommendation?
- It just verifies a list of symbols used by an executable file against
private / unstable lists. Not ready-to-use compare ABI vs ABI tool.
Traversed over several documents and, in brief, the best description is KDE
project guidelines (it is often linked from other good sources):
https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B
https://community.kde.org/Policies/Binary_Compatibility_Examples
Those sources are (looked briefly):
https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html
https://www.akkadia.org/drepper/dsohowto.pdf
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1976.html
http://syrcose.ispras.ru/2009/files/02_paper.pdf
https://accu.org/content/conf2015/JonathanWakely-What%20Is%20An%20ABI%20And%20Why%20Is%20It%20So%20Complicated.pdf
What I need to think: is it okay to use padding for a structure instead of
d-pointer? How much padding is okay for performance matters? Is it ever okay
to have non-opaque structures (we have no ones now in module.h)?
Future updates
--------------
Now I investigated the area a bit and want to share certain
recommendations:
- How to expose a non-opacue structure to keep it ABI compatible over
different tarantool versions (padding and so on).
- How to write a Lua/C module that able to use a feature from a new
tarantool version, but work with reduced functionality on an old
tarantool version (using dlsym()).
I'll do when time will permit.
WBR, Alexander Turenko.
More information about the Tarantool-discussions
mailing list