The HRT Beat

How Our Engineers Hot-Patched a Third Party Binary Library
Written
Topic
Published

Dec 7, 2023

At HRT, low-level engineers face and solve as a team all kinds of technical challenges. This post documents a quite involved but particularly fun one. We had to delve into the intricacies of the system and learned a lot on the way. We hope you will too.

Introduction

Some time ago, HRT needed a production bug fixed in a third party binary-only shared library. The issue was quite pressing. However it can take a lot of time to put together a reproducer (if at all possible), send it to a vendor, and get an updated version. Luckily, we quickly discovered that the newest version of this library fixed the issue we were experiencing. This was going to be an easy fix, or so we thought! Throughout this post, we’ll call our current version of the library libfoo.so.1.1 and the new one libfoo.so.1.2.

Note that we run Linux on 64-bit x86 computers.

A primer on shared library versioning

Shared libraries are usually linked to an application at runtime. However, if an application was built with one version of the library but an incompatible one is provided at runtime, the software will exhibit undefined behavior sooner or later.

To detect this kind of issue, shared libraries come with a version string called a soname.1 A change in soname simply means that the library is not backwards compatible. In other words, the application binary interface (ABI) of the build-time and runtime libraries are not compatible.

At build time, applications are linked against the soname of the library. During a library installation, ldconfig creates a symlink from the actual library to its soname. Let’s look at an example:

$ ldd /bin/ls | grep libc
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fce0df8ce00)
$ ls -l /lib/x86_64-linux-gnu/libc.so.6 
lrwxrwxrwx 1 root root 12 Feb 20  2023 /lib/x86_64-linux-gnu/libc.so.6 -> libc-2.31.so*
$ readelf -d /lib/x86_64-linux-gnu/libc-2.28.so | grep SONAME
 0x000000000000000e (SONAME)             Library soname: [libc.so.6]

As you can see, ls is linked against a libc.so.6 symlink which points to the libc-2.31.so file (corresponding to glibc 2.31 on this machine). Our distribution ships only libc-2.31.so in its glibc package. When the package is installed, ldconfig runs and creates the libc.6.so symlink.

If we later install, say, glibc 2.32 that ships with the same soname (libc.so.6), nothing needs to be done after pointing the libc.6.so symlink to libc-2.32.so. The soname is the same, which means the library author has promised us that glibc 2.32 is backward compatible with 2.31.

This is a very simple scheme that relies on a string in the binary and symlinks. No runtime check is actually done, so it’s quite important for libraries to have the correct soname. If there is a mismatch, software will crash in the most head-scratching way.

Back to our library

Surprisingly when we installed libfoo.so.1.2, our software would not start; the new library had a different soname. If we wanted to do things properly, this meant we had to rebuild all the involved software and potentially build two different versions of the library during the testing phase. This would add a significant amount of work and time to build, test, and deploy the fix. 

Oddly enough, the new soname contained no version number. The previous soname was libfoo.so.1 and the new one was libfoo.so. It was quite strange to remove the versioning numbering; typically a new soname would have been libfoo.so.2. Unexpectedly, we also noticed that the new version was actually backwards compatible, so no change of soname was even necessary! We could have renamed the file from libfoo.so.1.2 to libfoo.so.1 to work around the soname change, but that would have prevented building against this library as linked software would be looking for libfoo.so. All of this would have resulted in a pretty janky setup.

Our setup would be greatly simplified if we could modify the soname of the new library to match the previous one. We reached out to our vendor, but we couldn’t expect the kind of fast turnaround we’d like. As a result, we decided we would try to modify the soname of the binary to be libfoo.so.1 as it should be.

Attempting a quick-and-dirty binary edit

As mentioned earlier, the soname is simply a string inside the binary. Could it be possible to just add the missing “.1” to our new library soname? Maybe it was followed by some padding bytes that we could overwrite? A programmer can dream! First we have to understand a little about the binary format used by shared libraries.

Shared libraries on Linux are binary ELF files. This is quite a complex format. It’s divided into segments that contain different sections that refer to each other through offsets:

Each section of the file may be or may not be mapped into memory at runtime. Sections that follow each other in the file may be mapped at different places in memory at runtime. There are all kinds of headers and offsets to keep track of that. 

The first order of business was to figure out where in the binary the soname is stored. If you’re lucky (and we were here), the string that contains the soname will be present only once, so finding its location is relatively easy.

First, you need to run print the string table section for dynamic linking (called .dynstr) with readelf -s .dynstr file and grep for the soname. This will show you the offset of the soname string in the .dynstr section. Then find the start of this section and add the offset to it to find the location of the same in the file.2

$ readelf -p .dynstr libfoo.so.1.2  | grep libfoo.so
[  17ac]  libfoo.so
$ readelf -S libfoo.so.1.2 | grep dynstr
  [ 4] .dynstr           STRTAB           0000000000003f00  00003f00
$ printf "0x%xn" $((0x17ac + 0x3f00))
0x56ac
$ xxd -s 0x56ac libfoo.so.1.2 | head -2
000056ac: 6c69 6266 6f6f 2e73 6f00 474c 4942 435f  libfoo.so.GLIBC_
000056bc: 322e 3300 474c 4942 435f 322e 3134 0047  2.3.GLIBC_2.14.G

We could see that our “libfoo.so” soname is tightly packed with other strings. It already felt that our hack was likely to fail but we might as well try. We simply replaced the bytes 0x56b5 -> 0x56b7 with our string:

$ xxd -s 0x56ac libfoo.so.1.2 | head -2
000056ac: 6c69 6266 6f6f 2e73 6f2e 3100 4942 435f  libfoo.so.1.IBC_
000056bc: 322e 3300 474c 4942 435f 322e 3134 0047  2.3.GLIBC_2.14.G

You might wonder why we were replacing existing bytes instead of inserting. The reason is that ELF is a binary format that refers to other sections (such as pointing to specific strings like the soname) by offsets. This means that if we were to insert anything, we’d have to find and rewrite many of these offsets. This is not something that can be done manually.

This first attempt seemed to succeed. We could see the modified soname. When we tried to run ldconfig, it created the proper soname symlink.

$ readelf -d libfoo.so.1.2 | grep SONAME
 0x000000000000000e (SONAME)             Library soname: [libfoo.so.1]

However when we tried to run our software with this library, it would not start due to a missing symbol! Let’s dump the version need information of the patched and unpatched libraries:

$ ./oursoftware
./oursoftware: /lib64/ld-linux-x86-64.so.2: version `1' not found (required by libfoo.so.1)

$ readelf -V libfoo.so.1.2 | grep -A3 "Version needs" 
Version needs section '.gnu.version_r' contains 2 entries:
 Addr: 0x00000000000059e8  Offset: 0x0059e8  Link: 4 (.dynstr)
  000000: Version: 1  File: ld-linux-x86-64.so.2  Cnt: 1
  0x0010:   Name: 1  Flags: none  Version: 5

$ readelf -V libfoo.so.1.2.old | grep -A3 "Version needs" 
Version needs section '.gnu.version_r' contains 2 entries:
 Addr: 0x00000000000059e8  Offset: 0x0059e8  Link: 4 (.dynstr)
  000000: Version: 1  File: ld-linux-x86-64.so.2  Cnt: 1
  0x0010:   Name: GLIBC_2.3  Flags: none  Version: 5

The unpatched library had a dependency on a GLIBC_2.3 symbol that became a 1 in our patched file. As noted above, the ELF format uses offset to refer to things. And it appears that the bytes we replaced are still referred to. We had replaced GL with 1\x0 so the linker thought that it needed to find a 1 symbol in ld-linux-x86-64.so.2 instead of GLIBC_2.3.

This was getting complicated! Surely there was a tool that we could use.

Trying patchelf

Patchelf is a nifty tool that allows you to modify binary ELF files (such as shared libraries). In particular, it can also change the soname of a library.

We ran patchelf on our shared library passing the old library’s soname. The readelf command above confirmed it now had the previous soname.

$ readelf -d libfoo.so.1.2 | grep -i soname                                                                                                                                             
 0x000000000000000e (SONAME)             Library soname: [libfoo.so]
$ patchelf --set-soname libfoo.so.1 libfoo.so.1.2                                                                                                                 
$  readelf -d libfoo.so.1.2 | grep -i soname
 0x000000000000000e (SONAME)             Library soname: [libfoo.so.1]

We high fived and confidently started to test the new library, almost certain that we were done. We fired up ldd to see if our library was getting picked up

$ ldd ./oursoftware
./oursoftware: error while loading shared libraries: libfoo.so.1: ELF load command address/offset not properly aligned

Let’s see what patchelf did to our software. First we noticed that it appended the new soname to the .dynstr section (instead of modifying it) and moved that section at the end of the binary. This is loaded with a new program header

$ readelf -p .dynstr libfoo.so.1 | grep libfoo.so
  [  194c]  libfoo.so
  [  1a0c]  libfoo.so.1
$ readelf -S libfoo.so.1 | grep dynstr
  [29] .dynstr           STRTAB           0000000000402ac8  0004aac8
$ readelf -l libfoo.so.1
(...)
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
(...)
  LOAD           0x000000000004a000 0x0000000000400000 0x0000000000400000
                 0x0000000000002588 0x0000000000002588  RW     0x200000

The 3rd party library had its LOAD segments configured to be loaded on a 2MiB boundary (0x200000) and patchelf did not handle that gracefully. Since this didn’t matter to us here, we simply modified patchelf to forcibly change the alignment back to a regular page size.

--- a/src/patchelf.cc
+++ b/src/patchelf.cc
@@ -897,7 +897,7 @@ void ElfFile<ElfFileParamNames>::rewriteSectionsLibrary()
         wri(phdr.p_vaddr, wri(phdr.p_paddr, startPage));
         wri(phdr.p_filesz, wri(phdr.p_memsz, neededSpace));
         wri(phdr.p_flags, PF_R | PF_W);
-        wri(phdr.p_align, alignStartPage);
+        wri(phdr.p_align, getPageSize());
     }
 
     normalizeNoteSegments();

Are we done? Our hopes are quickly dashed. When we ran ldconfig, it failed by printing “file libfoo.so.1.2 is truncated.” 🫤

ldconfig is failing while trying to load the dynamic string section that patchelf just added. Let’s see what it expects:

/* Find the string table. */
dynamic_strings = NULL;
for (dyn_entry = dynamic_segment; dyn_entry->d_tag != DT_NULL; ++dyn_entry)
{
  check_ptr (dyn_entry);
  if (dyn_entry->d_tag == DT_STRTAB)
    {
      /* ... */
      dynamic_strings = (char *) (file_contents + dyn_entry->d_un.d_val- loadaddr);

The code is trying to load this section by treating the d_val field as an offset where the file is mapped. The dyn_entry type is:

typedef struct {
    unsigned char d_tag[8];               /* entry tag value */
    union {
      unsigned char       d_val[8];
      unsigned char       d_ptr[8];
    } d_un;
} Elf64_External_Dyn;

Now we looked at how patchelf modifies the soname. The approach is quite straightforward: create a section that can accommodate the new soname, simply add the new soname at the end of the section and append the new segment at the end of the file.

When patchelf writes the new section and populates the Elf64_External_Dyn struct described above, it does:

    for (auto dyn = dyn_table; (d_tag = rdi(dyn->d_tag)) != DT_NULL; dyn++)
            if (d_tag == DT_STRTAB)
                dyn->d_un.d_ptr = findSectionHeader(".dynstr").sh_addr;

The first thing we noticed was that they use different fields — d_val in one, d_ptr in the other. Though these are the same type and size and part of the same union, the field is treated as an offset in ldconfig but as an address in patchelf.

Let’s change patchelf again to try to work around this issue:

@@ -1191,7 +1191,7 @@ void ElfFile<ElfFileParamNames>::rewriteHeaders(Elf_Addr phdrAddress)
         unsigned int d_tag;
         for (auto dyn = dyn_table; (d_tag = rdi(dyn->d_tag)) != DT_NULL; dyn++)
             if (d_tag == DT_STRTAB)
-                dyn->d_un.d_ptr = findSectionHeader(".dynstr").sh_addr;
+                dyn->d_un.d_val = findSectionHeader(".dynstr").sh_offset;
             else if (d_tag == DT_STRSZ)
                 dyn->d_un.d_val = findSectionHeader(".dynstr").sh_size;
             else if (d_tag == DT_SYMTAB)

Hooray, ldconfig now parsed the file properly and created the libfoo.so.1 symlink pointing to libfoo.so.1.2. Unfortunately, when we tried to start our software, it crashed right away. 😭

It turns out that the string mapped at the offset is present in memory but actually mprotected with PROT_NONE by the runtime loader because our patched ELF file now has non-contiguous PT_LOAD segments. 🤯 Note that the runtime loader would actually be happy with the original patchelf code where sh_addr is used.

This meant that the only way for both the loader and ldconfig3 (and potentially other parts of the system) to properly parse and use our patched ELF file was to make a large number of changes to patchelf.

There had to be an easier way.

Back to editing the file manually

Our initial approach almost worked except for this one version requirement that was pointing to a string we wanted to change. It occurred to us that if we could just drop it, things would work just fine. This was a version requirement for glibc 2.3 which is long gone.4 If you’re going to ship a library to hundreds of customers, removing dependencies is a bad idea. But in our limited environment where we knew this requirement was fulfilled, this dependency was simply useless.

Simply dropping the entry would cause many offsets to change and create even more issues. However, we could simply override it to point to something else. Let’s look again at the requirements:

$ readelf -V libfoo.so.1.2.old | grep -A5 "Version needs" 
Version needs section '.gnu.version_r' contains 2 entries:
 Addr: 0x00000000000059e8  Offset: 0x0059e8  Link: 4 (.dynstr)
  000000: Version: 1  File: ld-linux-x86-64.so.2  Cnt: 1
  0x0010:   Name: GLIBC_2.3  Flags: none  Version: 5
  0x0020: Version: 1  File: libc.so.6  Cnt: 6
  0x0030:   Name: GLIBC_2.14  Flags: none  Version: 8

We would like to avoid pointing to the GLIBC_2.3 string. What if we simply duplicated the 2nd requirement over the 1st one? Nothing would be pointing to the truncated GLIBC_2.3. Let’s look at the version requirement structures:

typedef struct {
	Elf64_Half    vn_version;
	Elf64_Half    vn_cnt;
	Elf64_Word    vn_file;
	Elf64_Word    vn_aux;
	Elf64_Word    vn_next;
} Elf64_Verneed;

typedef struct {
	Elf64_Word    vna_hash;
	Elf64_Half    vna_flags;
	Elf64_Half    vna_other;
	Elf64_Word    vna_name;
	Elf64_Word    vna_next;
} Elf64_Vernaux;

Elf64_Verneed describes the lines printed as File above and is followed by one or more Elf64_Vernaux representing the lines with Names above.

Now let’s print this in hex:

$ xxd -s 0x59e8 libfoo.so.1.2 | head -4                                                                                                                                                                
000059e8: 0100 0100 2419 0000 1000 0000 2000 0000  ....$....... ...
000059f8: 1369 690d 0000 0500 5c19 0000 0000 0000  .ii............
00005a08: 0100 0600 0f19 0000 1000 0000 0000 0000  ................
00005a18: 9491 9606 0000 0800 6619 0000 1000 0000  ........f.......

Line 3 is the Elf64_VerNeed we wanted to replace with line 3. Line 4 is the Elf64_Vernaux we wanted to replace with line 4.

There was just one small adjustment we needed to make. The first file had only one Elf64_Vernaux entry while the second one had 6 (the Cnt field displayed by readelf above). When copying line 3, we simply changed the third byte from 0x6 to 0x1. We’re simply changing the least significant byte in vn_cnt.

After this change our requirements look like:

$ readelf -V libfoo.so.1.2 | grep -A5 "Version needs" 
Version needs section '.gnu.version_r' contains 2 entries:
 Addr: 0x00000000000059e8  Offset: 0x0059e8  Link: 4 (.dynstr)
  000000: Version: 1  File: libc.so.6  Cnt: 1
  0x0010:   Name: GLIBC_2.14  Flags: none  Version: 8
  0x0020: Version: 1  File: libc.so.6  Cnt: 6
  0x0030:   Name: GLIBC_2.14  Flags: none  Version: 8

The output looks quite silly with the duplicated requirement. But with this last set of changes, the library could be loaded flawlessly by all the tools on the system. We could finally declare victory! 🥳

Conclusion

This was certainly a journey. It was quite satisfying for our team to have managed to solve this issue. Our vendor eventually reached out and agreed to fix the soname in their next revision so all is well!

If you have found this post interesting, check out other engineering-oriented posts on our blog. If you would like to know more about HRT, check out our website or consider applying to join our team!

Further reading

The Linkers and loaders book by John R. Levine
The Linux Standards Base for x86_64
The ELF format poster

Appendix: Binary compatibility in shared libraries

As we mentioned in our post, the soname of a library must be changed if it is not backwards compatible. Backwards compatibility for libraries is not always well understood so we thought we’d give some background about it.

Backwards compatibility means that code compiled and linked with an older version of the library will run just the same if the shared library is upgraded without recompiling the dependent code.

It’s a crucial property for shared libraries as upgrading both libraries and dependent code in lockstep is a lot more involved than just copying in a new shared library. It makes potentially reverting a lot more difficult. If a shared library is backwards compatible, users can upgrade the libraries across their system then start recompiling the code against it. Each step is safe and should be easy to revert.

Ensuring backward compatibility is not trivial and requires understanding of the platform ABI. However these are some simple things that can be done to maintain backwards compatibility:

  • Adding a new function or a non-virtual method to a library
  • Adding new types
  • Any non-inlined logic changes (for example adding a check in an existing function that some pointer is not NULL) that does not change the contract of the library.
  • Adding new static or global variables

A very common thing that we’d like to discuss is adding new fields to existing types used in the public interface. It’s not as straightforward as other changes we mentioned above but it’s possible to do it with a little bit of planning.

The basic idea is to only add fields at the end of a struct and always pass these by address and give the new code a way to distinguish between the older and the newer structures at runtime. To do that, there are two very straightforward ways:

  1. Passing the size of the struct along with the pointer. It’s tedious but it’s bullet proof. The size argument can be automatically added by the use of forced inline function.
  2. Adding a flag field that indicates which fields are present in the struct. This technique can also be hidden from the user by using forced inline constructors.

This is just barely an introduction on the subject. For a more in-depth discussion on this topic and more advanced techniques, we recommend How to write shared libraries by Ulrich Drepper and How the GNU C library handles backwards compatibility.


1.“readelf -d file | grep SONAME” can be used to display the soname in “file”. The readelf tool is part of the binutils package.↩︎
2. We use xxd which ships with the vim editor to display a hex dump but any tool to dump hexadecimal would do. ↩︎
3. This turned out to be a bug in readelflib.c in glibc which was fixed in glibc 2.31 ↩︎
4. It was released in 2002: https://sourceware.org/glibc/wiki/Glibc%20Timeline ↩︎

Don't Miss a Beat

Follow us here for the latest in engineering, mathematics, and automation at HRT.