[Tarantool-discussions] Consider exporting symbols from libraries: small, msgpuck

Alexander Turenko alexander.turenko at tarantool.org
Mon Sep 7 01:42:15 MSK 2020


I was accumulating thoughts around ABI compatibility for myself during
some time and want to share them. The main question that I bring into
attention here: whether it worth to expose msgpuck, small and other
libraries APIs into tarantool's module API.

Problem
-------

A tarantool module (say, memcached) uses a library, which is also used in
tarantool (say, small). Let's assume that tarantool and the module use
different versions of the library. Say, a layout of some structure was
changed: a non-last field was removed or a field was added to the middle.

 | tarantool executable
 | --------------------
 |
 | /* foo.h */
 |
 | struct foo {
 |     uint64_t bar;
 |     uint64_t baz;
 |     struct foo *next;
 | }
 |
 | void
 | foo_create(struct foo *foo, struct foo *next);
 |
 | /* foo.c */
 |
 | void
 | foo_create(struct foo *foo, struct foo *next)
 | {
 |     foo->bar = 0;
 |     foo->baz = 0;
 |     foo->next = next;
 | }

 | module dynamic library
 | ----------------------
 |
 | /* foo.h */
 |
 | struct foo {
 |     /* !! no bar !! */
 |     uint64_t baz;
 |     struct foo *next;
 | }
 |
 | void
 | foo_create(struct foo *foo, struct foo *next);
 |
 | /* foo.c */
 |
 | void
 | foo_create(struct foo *foo, struct foo *next)
 | {
 |     /* !! no foo->bar = 0 !! */
 |     foo->baz = 0;
 |     foo->next = next;
 | }

Let's look how a breakage may occur.

After unhiding internal symbols in tarantool executable (see [1]), a call of
foo_create() from the module will actually call the function from tarantool
executable, which will set foo->next to NULL (`foo->baz = 0;`) and will access
a memory out of the structure bounds (`foo->next = next;`).

Note for myself: I would take extra care to inline functions in public
headers, however I have no example of a possible breakage in the mind.
Noted here to think around it later.

Note: Some msgpuck symbols were exposed even before [1]. I guess it was to use
them using LuaJIT FFI.

[1]: https://github.com/tarantool/tarantool/issues/2971

Background
----------

- Default on Linux: use a symbol from executable file.
- MacOS behaviour is like RTLD_DEEPBIND is used (from Vlad Sh.)
- See dlopen(3): RTLD_DEEPBIND (place a symbol from a library before global
  one in the lookup order).

Known cases
-----------

- LTO and ASAN complains about this.

  https://github.com/tarantool/tarantool/issues/5001
  LTO fix: https://github.com/tarantool/tarantool/commit/36927e540549fbdfd156ac3518616dbf4642711f
  ASAN fix: https://github.com/tarantool/tarantool/commit/e8c72d4fe66ea94e357af2e527cb5cc4727f09da

- memcached fails on some tarantool versions.

  This case is almost same as the abstract one described above: the symbol
  unhiding patch leads to the breakage.

  https://github.com/tarantool/memcached/issues/59

- box_txn_alloc() changes its behaviour.

  Not strictly related to the problem described above, but it is another
  tarantool public C API breakage. So it is related to the question below: how
  to test the API to prevent this kind of breakage.

  https://github.com/tarantool/memcached/issues/53

My questions
------------

- Should not we expose small, msgpuck libraries symbols from tarantool
  executable and ship corresponsing header files?

- How to ensure that exposed API / ABI is stable: one may use old headers to
  compile, but symbols from newer executable at runtime.
  - Of course, we should test tarantool changes against external modules. But
    it is not general ABI compatibility verification: some cases may not be
    covered by a module test, there may be closed-source modules.
  - How existing ABI compatibility checkers are? Say, [2].
    - Looks promising: at least the description suggests that the case above
      would be catched ('renamed fields').

- We should define rules how to change public API structures and functions.
  Existing of such checklists makes life easier.
  - Many points should be here, but I'll highlight one that comes into my mind
    (just to don't forgot about it): we possibly will need to use padding at
    end of public structures to have ability to extend it. Or explicitly state
    that a structure is not known at build time, so it may not be used in
    arrays or allocated on a stack. If there is no need to provide direct
    access to first N fields (say, due to performance matters), we can just
    make it opaque.

- Can we just ship small / msgpuck header files and expose its symbols from
  tarantool? Or we need a separate public API layer?
  - The former would obligate us to keep those libraries ABI compatible.
  - The latter don't: this way the library should only be used as static one.

- How about performance? Whether building a module with a library (like small
  or msgpuck) directly (not using of tarantool's one) may give better
  performance because of using inline functions and macros?

- Can we make bundling a library into a module safe using symbol renaming
  (say, some macro magic)?

- For particular case: using of fiber()->gc: can we expose some reduced API
  from tarantool and be happy?

[2]: https://lvc.github.io/abi-compliance-checker/

Why I started the discussion?
-----------------------------

I want to implement Lua API for key_def as an external module, which should be
based on a public C API (which in turn should be extended for this matter).
The built-in key_def Lua module uses fiber->gc region; region functions are
part of the small library.

Considering version mismatch problems we already met in the past I would
prefer to expose small library symbols from tarantool executable and use them
in the module.

I found that just exposing relevant symbols does not shield us from ABI
breakage problems, so the questions above should be resolved (sooner or
later).

Mea culpa
---------

Well, I should google for 'how to write abi compatible libraries', read some
articles and I guess most of my questions will gone. I wrote the letter above
just to formalize things for myself, but than found that it may be used as the
base for further discussions.

Forward ABI compatibility guidelines
------------------------------------

This sections is added later, so it may contradict with something written
above.

Excerpts of useful info from different sources.

- https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html
  - symbol versioning
    - when exactly it is needed? what a problem it solves?
  - policy: don't change anything, only add
  - separation of interface and implementation
    - how about macroses, which wraps sizeof() / alignof() calls?
  - testing
    - `make check-abi`
      - It just check all symbols using sizeof(), alignof() and so. I guess
        also check list of symbols and each structure field.
      - Files (from gcc tarball):
        - libstdc++-v3/testsuite/Makefile.in
        - libstdc++-v3/testsuite/util/testsuite_abi_check.cc
        - libstdc++-v3/testsuite/util/testsuite_abi.{h,cc}
        - libstdc++-v3/libsupc++/cxxabi.h
        - <...>
    - `make check-c++` just runs the C standard library test suite. The idea
      of ABI compatibility check is to run a testsuite from one version
      against another one.
    - http://abicheck.sourceforge.net/
      - It is linked from the page. Why? Is it used in GCC? Is it related to
        `make check-abi`? Is it just recommendation?
      - It just verifies a list of symbols used by an executable file against
        private / unstable lists. Not ready-to-use compare ABI vs ABI tool.

Traversed over several documents and, in brief, the best description is KDE
project guidelines (it is often linked from other good sources):

https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B
https://community.kde.org/Policies/Binary_Compatibility_Examples

Those sources are (looked briefly):

https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html
https://www.akkadia.org/drepper/dsohowto.pdf
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1976.html
http://syrcose.ispras.ru/2009/files/02_paper.pdf
https://accu.org/content/conf2015/JonathanWakely-What%20Is%20An%20ABI%20And%20Why%20Is%20It%20So%20Complicated.pdf

What I need to think: is it okay to use padding for a structure instead of
d-pointer? How much padding is okay for performance matters? Is it ever okay
to have non-opaque structures (we have no ones now in module.h)?

Future updates
--------------

Now I investigated the area a bit and want to share certain
recommendations:

- How to expose a non-opacue structure to keep it ABI compatible over
  different tarantool versions (padding and so on).
- How to write a Lua/C module that able to use a feature from a new
  tarantool version, but work with reduced functionality on an old
  tarantool version (using dlsym()).

I'll do when time will permit.

WBR, Alexander Turenko.


More information about the Tarantool-discussions mailing list