Monday 6 February 2017

Ask for guidance on packaging sources with multiple libraries

Hi,
I'm reaching out to you to get some guidance how to handle a soname versioning issue I didn't face before.

TL;DR:
How to correctly package a source creating libraries with individual ABI's bumped separately but that depend on each other and due to that ending up in mixed versions in the executable after ld.so mapped in dependencies?
I'm reaching out to you as this is a case I have no experience with.
I've thought and discussed on several solutions but I'm sure on none of them yet.
  • Compat packages with symlinks to the new .so version as it has ABI backward compat symbols
  • Hard inserting the major version (currently 16.11) into every library version e.g. making the new a librte_eal.so.16.11.3
  • Some magic combination of breaks/conflicts/replaces that prevents the error window (lib updated, but not consumers) to occur
  • Others according to your guidance
See further below for details and background on these options.

    ## Background ##

    - DPDK in the past had a single library, but no soname versioning libdpdk.so.0
    - Then they dropped the option for a single lib, but OTOH ABI tracking for individual libs was improved
    - This was then deb packaged as usual versioned library packages; librte_XXX<VER> in the dpdk case.
    - I'm unsure if upstream ABI tracking is "complete" given the case I outline below, yet I need to adapt in packaging no matter what.
    - To be clear, as soon as a consuming app got a no-change rebuild things are fine again as all versions match again.


    ## The sub-dependencies issue ##

    DPDK Libraries serve many purposes, from memory allocation to package processing.
    There are a few which can be considered "core" libraries like librte_eal and librte_ethdev.
    That means that most sublibraries will depend on those cores as well.

    Dependencies prior to the update (trimmed down lddtree -a on openvswitch):
    ovs-vswitchd-dpdk => /usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk (interpreter => /lib64/ld-linux-x86-64.so.2)
        librte_pdump.so.1 => /usr/lib/x86_64-linux-gnu/librte_pdump.so.1
            librte_eal.so.2 => /usr/lib/x86_64-linux-gnu/librte_eal.so.2
        librte_eal.so.2 => /usr/lib/x86_64-linux-gnu/librte_eal.so.2

    Dependencies after the update:
    ovs-vswitchd-dpdk => /usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk (interpreter => /lib64/ld-linux-x86-64.so.2)
        librte_pdump.so.1 => /usr/lib/x86_64-linux-gnu/librte_pdump.so.1
            librte_eal.so.3 => /usr/lib/x86_64-linux-gnu/librte_eal.so.3
        librte_eal.so.2 => /usr/lib/x86_64-linux-gnu/librte_eal.so.2

    That causes symbol collisions. When starting it runs through some new init code (only in librte_eal3): rte_timer_init -> rte_delay_us_callback_register to write to a symbol that collides "rte_delay_us".

    gdb: info symbol rte_delay_us
      rte_delay_us in section .text of /usr/lib/x86_64-linux-gnu/librte_eal.so.2

    And the types differ:
      eal2:
      void rte_delay_us(unsigned us)                                                           
      {                                                                                   
        const uint64_t start = rte_get_timer_cycles();                                  
        const uint64_t ticks = (uint64_t)us * rte_get_timer_hz() / 1E6;                 
        while ((rte_get_timer_cycles() - start) < ticks)                                
            rte_pause();                                                                
      } 
      eal3:
       void (*rte_delay_us)(unsigned int) = NULL;

    So the new ABIs code writes to what it thinks to be a function pointer in eal3 but instead writes to the text segment of eal2 -> segfault.



    ## How could this happen ##

    In the DPDK world only those sub-libraries that get ABI changes get a bump.
    In the last case that was librte_eal, librte_cryptodev and libethdev (the latter also getting a rename).
    So the new source upload provides new versions of the same ABI as the old provided, e.g. in my example above librte_pdump.so.1.
    But since the new "librte_pdump.so.1" was build with the new source it depends on the new core libraries like librte_eal3.
    On the other hand the "old" Openvswitch itself still depends on  the old librte_eal2 provided by the old package.


    ## Maybe A special DPDK option around that ##

    There is an extra twist od DPDK to consider as an option.
    DPDK provides backward ABI compatibility via symbol maps.
    Not forever but at least for one version backwards (or more if maintainable).
    That said a new librte_eal.so.3 should be able to serve as librte_eal.so.2 via this compatibility layer.
    But I already had failed assumption based on that creating transitional packages for librte_eal2 that map to librte_eal3.
    That is not how named library soname versions work, yet this DPDK compat feature lured me in there.
    But if done "right" that could be a librte_eal2 containing a symlink to librte_eal.so.3 and depending on the package.
    A quick verify with that worked:
     $ mv /usr/lib/x86_64-linux-gnu/librte_eal.so.2 /usr/lib/x86_64-linux-gnu/backup-librte_eal.so.2
     $ ln -s /usr/lib/x86_64-linux-gnu/librte_eal.so.3 /usr/lib/x86_64-linux-gnu/librte_eal.so.2
     $ mv /usr/lib/x86_64-linux-gnu/libethdev.so.4 /usr/lib/x86_64-linux-gnu/backup-libethdev.so.4
     $ ln -s /usr/lib/x86_64-linux-gnu/librte_ethdev.so.5 /usr/lib/x86_64-linux-gnu/libethdev.so.4
    With that in place it started correctly

    Yet DPDK releases all three months, so sometimes we skip versions in the Distribution.
    That means there could be cases where even this mechanism fails - so if this solution is good for now I'd be happy. But I'd still want to clarify what we would need to do if it is not an option at some point into the future.


    ## Potential Solutions?: ##

    See the TL;DR at the top, but I'm not sure what to go for at the moment, so I hope for your guidance :-/


    ## Appendix ##

    (FYI Full Dependencies)


    --
    Christian Ehrhardt
    Software Engineer, Ubuntu Server
    Canonical Ltd