Photo by Roger Starnes Sr on Unsplash
Build a C library with user-defined allocator
Just bringing your own malloc/free functions!
The story starts with the wish to build a C
shared library not depending on the standard library.
Context
When our project demands to read files, open network sockets or merely print some logs, we are likely to rely on the libc
. Indeed, it provides the interface to do these tasks easily (remember we are coding in C
). But what if we develop a simple algorithm that merely makes math operations? Maybe we could go without that standard library.
The advantages are numerous, especially on the portability: the final lib will be likely to run anywhere (from x86 to arduino) without heavy lifting.
Back to reality
The goal is beautiful but we generally need some interactions with the OS (libc
is the common interface for that). In particular, I am talking about memory allocation. Yes, we generally need to create structures and store other data (an example below).
Let us assume the following project: we want to build a histogram that updates its bins content when we feed new data. Our project structure is given below.
libhistogram/
| histogram.c
| histogram.h
with the draft of our header (histogram.h
):
// histogram.h
#include <stddef.h> // for size_t definition (it depends on the target)
/* A basic structure to build histogram between a and b */
struct histogram {
double a; // left bound
double b; // right bound
size_t bins; // number of bins
int* counts; // bins
double* right_bounds; // bins boundaries
};
/**
* @brief Insert a new data
* @param data value between a and b
* @return 0 if the operation is successful
*/
int feed_histogram(struct histogram *h, double data);
The burning issue is, at some point we need to instantiate that structure. How to do that without malloc
? By malloc
we mean all the functions that can (dynamically) allocate memory, with this kind of prototype:
void* malloc(size_t);
Stack allocation
The first solution is to let the user do the job by stack allocating the structure and filling its fields.
// main.c
#include "histogram.h"
#define BINS 10
int main(int argc, const char* argv[]) {
// allocate everything on the stack
size_t const n = BINS;
int counts[BINS];
double right_bounds[BINS];
struct histogram h;
// and init the structure by yourself
h.a = 0.0;
h.b = 1.0;
h.bins = 10;
h.counts = &counts[0];
h.right_bounds = &right_bounds[0];
for (double x = 0.0; x < 100.0; ++x) {
feed_histogram(&h, x / 100.0);
}
}
Great! But you notice that everything is hard-coded and the user has work to do. The library API here is not friendly. There is also a third design (or security) issue: the user can access all the memory behind the structure. From a design point of view, it is better to separate what the user needs and what the library needs. For the security part, the user should not be trusted :) so he must not even have known about all the underlying stuff.
To solve these problems, we may rather need dynamic allocation.
Just give me an allocator
Obviously, we can let the user call malloc
to init all the memory but its work remains the same: calling several times malloc
to fully define the structure (very bad library API, so very bad DX).
Basically, our library needs several calls to malloc
but remember that our library does not know this function. So we have to update the library to let the user provide this function (and free
too).
// histogram.h
// alias of malloc and free prototypes
typedef void *(*malloc_fn)(size_t);
typedef void (*free_fn)(void *);
// declare global allocator
malloc_fn umalloc;
free_fn ufree;
/**
* @brief Define allocator
*/
void set_allocator(malloc_fn m, free_fn f);
/* ... */
Providing these two functions looks like a constraint for the user but it is a strength. Now he can implement his own allocator, specific to his needs (speed, space, security, debug...). For instance, on Arduino, the available memory is very limited so you can imagine developing a custom thrifty allocator to manage dynamic allocation. If this feature is not required by your project, the user can pass the default malloc
and free
of the standard library (we will see an example later).
The histogram interface is now the following. Very clean for the user!
// histogram.h
struct histogram { /* ... */ };
/**
* @brief Define allocator
*/
void set_allocator(malloc_fn m, free_fn f);
/**
* @brief Initialize an histogram (NEW!)
* @param a left bound
* @param b right bound
* @param bins number of bins
* @return struct histogram* (pointer to histogram)
*/
struct histogram* init_histogram(double a,
double b,
size_t bins);
/**
* @brief free memory
* @param h pointer to histogram
*/
void free_histogram(struct histogram* h);
/**
* @brief Insert a new data
* @param h pointer to histogram
* @param data value between a and b
* @return 0 if the operation is successful
*/
int feed_histogram(struct histogram *h, double data);
On the implementation part, the library can then make all the work on its own
// histogram.c
#include "histogram.h"
void set_allocator(malloc_fn m, free_fn f) {
umalloc = m;
ufree = f;
}
struct histogram *init_histogram(double a, double b, size_t bins) {
// ensure with have allocation material
if ((!umalloc) || (!ufree)) {
return NULL;
}
// parameters check
if ((b <= a) || (bins == 0)) {
return NULL;
}
// structure allocation
struct histogram *h =
(struct histogram *)umalloc(sizeof(struct histogram));
if (!h) {
return h;
}
// now fill the structure fields
h->a = a;
h->b = b;
h->bins = bins;
int *counts = (int *)umalloc(bins * sizeof(int));
if (counts) {
h->counts = counts;
} else {
free_histogram(h);
return NULL;
}
double *right_bounds = (double *)umalloc(bins * sizeof(double));
if (right_bounds) {
h->right_bounds = right_bounds;
} else {
free_histogram(h);
return NULL;
}
double const w = (b - a) / (double)bins;
for (size_t i = 0; i < bins; ++i) {
h->counts[i] = 0; // bins are empty
h->right_bounds[i] = w * i; // init the bounds
}
return h;
}
void free_histogram(struct histogram *h) {
if (h) {
if (h->counts) {
ufree(h->counts);
}
if (h->right_bounds) {
ufree(h->right_bounds);
}
ufree(h);
}
}
int feed_histogram(struct histogram *h, double data) { /* ... */ }
Compilation
Remember that we want to compile our library without the standard library. No problem, compilers have a flag for that: -nostdlib.
$ gcc -std=c99 -Wall -pedantic -o libhistogram.so histogram.c -nostdlib -fPIC -shared
On my linux laptop (amd64, Fedora), I get
$ file libhistogram.so
libhistogram.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=f6f755c7fe68110ded02ab8991be5ab688400a31, not stripped
$ ldd libhistogram.so
statically linked
Here, we have a shared library that has no external dependencies. So we can easily deploy it on similar systems without worrying about requirements.
Usage
The paramount point is that we must provide an allocator. By default, you can still pass the malloc
and free
functions from the libc
that lives on your system.
Let us give an example... in Python! Below, we define a python interface to libhistogram
thanks to the powerful ctypes
built-in library.
# histogram.py
from ctypes import CDLL, c_double, c_int, c_size_t, c_void_p
# load libhistogram
libhistogram = CDLL("./libhistogram.so")
# histogram API ============================================
libhistogram.set_allocator.argtypes = [c_void_p, c_void_p]
libhistogram.set_allocator.restype = None
libhistogram.init_histogram.argtypes = [c_double, c_double, c_size_t]
libhistogram.init_histogram.restype = c_void_p
libhistogram.free_histogram.argtypes = [c_void_p]
libhistogram.free_histogram.restype = None
libhistogram.feed_histogram.argtypes = [c_void_p, c_double]
libhistogram.feed_histogram.restype = c_int
# ======================================================
# in this example we pass the classical malloc
# and free functions to libhistogram
libc = CDLL("") # here it loads libc
libhistogram.set_allocator(libc.malloc, libc.free)
class Histogram:
"""This class wraps libhistogram"""
def __init__(
self,
a: float = 0.0,
b: float = 1.0,
bins: int = 10,
) -> None:
self.__ptr = libhistogram.init_histogram(a, b, bins)
assert self.__ptr is not None, (
"An error occured at initialization, "
"maybe you did not pass allocator to the library."
)
def __del__(self) -> None:
libhistogram.free_histogram(self.__ptr)
def feed(self, data: float) -> int:
return libhistogram.feed_histogram(self.__ptr, data)
Conclusion
We have presented a way to make a shared library not rely on the standard library (malloc
and free
only). The idea is just to add an API endpoint (i.e. a function) to define the allocators to use. As a direct consequence, it allows the use of custom allocators (this is especially how the Zig programming language is designed for instance).