Build a C library with user-defined allocator

Just bringing your own malloc/free functions!

The story starts with the wish to build a C shared library not depending on the standard library.

Context

When our project demands to read files, open network sockets or merely print some logs, we are likely to rely on the libc. Indeed, it provides the interface to do these tasks easily (remember we are coding in C). But what if we develop a simple algorithm that merely makes math operations? Maybe we could go without that standard library.

The advantages are numerous, especially on the portability: the final lib will be likely to run anywhere (from x86 to arduino) without heavy lifting.

Back to reality

The goal is beautiful but we generally need some interactions with the OS (libc is the common interface for that). In particular, I am talking about memory allocation. Yes, we generally need to create structures and store other data (an example below).

Let us assume the following project: we want to build a histogram that updates its bins content when we feed new data. Our project structure is given below.

libhistogram/
  | histogram.c
  | histogram.h

with the draft of our header (histogram.h):

// histogram.h
#include <stddef.h> // for size_t definition (it depends on the target)

/* A basic structure to build histogram between a and b */
struct histogram {
  double a;                // left bound
  double b;                // right bound
  size_t bins;             // number of bins
  int* counts;             // bins
  double* right_bounds;    // bins boundaries
};

/**
 * @brief Insert a new data
 * @param data value between a and b
 * @return 0 if the operation is successful
 */
int feed_histogram(struct histogram *h, double data);

The burning issue is, at some point we need to instantiate that structure. How to do that without malloc? By malloc we mean all the functions that can (dynamically) allocate memory, with this kind of prototype:

void* malloc(size_t);

Stack allocation

The first solution is to let the user do the job by stack allocating the structure and filling its fields.

// main.c
#include "histogram.h"
#define BINS 10

int main(int argc, const char* argv[]) {
    // allocate everything on the stack
    size_t const n = BINS;
    int counts[BINS];
    double right_bounds[BINS];
    struct histogram h;
    // and init the structure by yourself
    h.a = 0.0;
    h.b = 1.0;
    h.bins = 10;
    h.counts = &counts[0];
    h.right_bounds = &right_bounds[0];

    for (double x = 0.0; x < 100.0; ++x) {
      feed_histogram(&h, x / 100.0);
    }
}

Great! But you notice that everything is hard-coded and the user has work to do. The library API here is not friendly. There is also a third design (or security) issue: the user can access all the memory behind the structure. From a design point of view, it is better to separate what the user needs and what the library needs. For the security part, the user should not be trusted :) so he must not even have known about all the underlying stuff.

To solve these problems, we may rather need dynamic allocation.

Just give me an allocator

Obviously, we can let the user call malloc to init all the memory but its work remains the same: calling several times malloc to fully define the structure (very bad library API, so very bad DX).

Basically, our library needs several calls to malloc but remember that our library does not know this function. So we have to update the library to let the user provide this function (and free too).

// histogram.h

// alias of malloc and free prototypes
typedef void *(*malloc_fn)(size_t);
typedef void (*free_fn)(void *);

// declare global allocator
malloc_fn umalloc;
free_fn ufree;

/**
 * @brief Define allocator
 */
void set_allocator(malloc_fn m, free_fn f);

/* ... */

Providing these two functions looks like a constraint for the user but it is a strength. Now he can implement his own allocator, specific to his needs (speed, space, security, debug...). For instance, on Arduino, the available memory is very limited so you can imagine developing a custom thrifty allocator to manage dynamic allocation. If this feature is not required by your project, the user can pass the default malloc and free of the standard library (we will see an example later).

The histogram interface is now the following. Very clean for the user!

// histogram.h

struct histogram { /* ... */ };

/**
 * @brief Define allocator
 */
void set_allocator(malloc_fn m, free_fn f);

/**
 * @brief Initialize an histogram (NEW!)
 * @param a left bound
 * @param b right bound
 * @param bins number of bins
 * @return struct histogram* (pointer to histogram)
 */
struct histogram* init_histogram(double a, 
                                 double b, 
                                 size_t bins);

/**
 * @brief free memory
 * @param h pointer to histogram
 */
void free_histogram(struct histogram* h);

/**
 * @brief Insert a new data
 * @param h pointer to histogram
 * @param data value between a and b
 * @return 0 if the operation is successful
 */
int feed_histogram(struct histogram *h, double data);

On the implementation part, the library can then make all the work on its own

// histogram.c
#include "histogram.h"

void set_allocator(malloc_fn m, free_fn f) {
    umalloc = m;
    ufree = f;
}

struct histogram *init_histogram(double a, double b, size_t bins) {
    // ensure with have allocation material
    if ((!umalloc) || (!ufree)) {
        return NULL;
    }

    // parameters check
    if ((b <= a) || (bins == 0)) {
        return NULL;
    }

    // structure allocation
    struct histogram *h =
        (struct histogram *)umalloc(sizeof(struct histogram));
    if (!h) {
        return h;
    }

    // now fill the structure fields
    h->a = a;
    h->b = b;
    h->bins = bins;
    int *counts = (int *)umalloc(bins * sizeof(int));
    if (counts) {
        h->counts = counts;
    } else {
        free_histogram(h);
        return NULL;
    }

    double *right_bounds = (double *)umalloc(bins * sizeof(double));
    if (right_bounds) {
        h->right_bounds = right_bounds;
    } else {
        free_histogram(h);
        return NULL;
    }

    double const w = (b - a) / (double)bins;

    for (size_t i = 0; i < bins; ++i) {
        h->counts[i] = 0;           // bins are empty
        h->right_bounds[i] = w * i; // init the bounds
    }

    return h;
}

void free_histogram(struct histogram *h) {
    if (h) {
        if (h->counts) {
            ufree(h->counts);
        }
        if (h->right_bounds) {
            ufree(h->right_bounds);
        }
        ufree(h);
    }
}

int feed_histogram(struct histogram *h, double data) { /* ... */ }

Compilation

Remember that we want to compile our library without the standard library. No problem, compilers have a flag for that: -nostdlib.

$ gcc -std=c99 -Wall -pedantic -o libhistogram.so histogram.c -nostdlib -fPIC -shared

On my linux laptop (amd64, Fedora), I get

$ file libhistogram.so
libhistogram.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=f6f755c7fe68110ded02ab8991be5ab688400a31, not stripped
$ ldd libhistogram.so
        statically linked

Here, we have a shared library that has no external dependencies. So we can easily deploy it on similar systems without worrying about requirements.

Usage

The paramount point is that we must provide an allocator. By default, you can still pass the malloc and free functions from the libc that lives on your system.

Let us give an example... in Python! Below, we define a python interface to libhistogram thanks to the powerful ctypes built-in library.

# histogram.py
from ctypes import CDLL, c_double, c_int, c_size_t, c_void_p

# load libhistogram
libhistogram = CDLL("./libhistogram.so")

# histogram API ============================================

libhistogram.set_allocator.argtypes = [c_void_p, c_void_p]
libhistogram.set_allocator.restype = None

libhistogram.init_histogram.argtypes = [c_double, c_double, c_size_t]
libhistogram.init_histogram.restype = c_void_p

libhistogram.free_histogram.argtypes = [c_void_p]
libhistogram.free_histogram.restype = None

libhistogram.feed_histogram.argtypes = [c_void_p, c_double]
libhistogram.feed_histogram.restype = c_int

# ======================================================

# in this example we pass the classical malloc 
# and free functions to libhistogram
libc = CDLL("") # here it loads libc
libhistogram.set_allocator(libc.malloc, libc.free)

class Histogram:
    """This class wraps libhistogram"""
    def __init__(
        self,
        a: float = 0.0,
        b: float = 1.0,
        bins: int = 10,
    ) -> None:
        self.__ptr = libhistogram.init_histogram(a, b, bins)
        assert self.__ptr is not None, (
            "An error occured at initialization, "
            "maybe you did not pass allocator to the library."
        )

    def __del__(self) -> None:
        libhistogram.free_histogram(self.__ptr)

    def feed(self, data: float) -> int:
        return libhistogram.feed_histogram(self.__ptr, data)

Conclusion

We have presented a way to make a shared library not rely on the standard library (malloc and free only). The idea is just to add an API endpoint (i.e. a function) to define the allocators to use. As a direct consequence, it allows the use of custom allocators (this is especially how the Zig programming language is designed for instance).