A simple C++ module system with CMake

27 November 2024

Implementing a module/plugin system for a C++ application for faster compile times.

One way to implement a simple module/plugin system in C++ using CMake.

By module or plugin, I mean a replaceable software components, where each serves a similar function, but with a different implementation. Not C++ 20 modules.

Background

In my current role, my team and I have been in the gradual process of converting a lot of Python code into C++. The software handles large volumes of time-series datasets, and is consequently difficult to parallelise. With the volume of data increasing over time, the processing time of the Python processors had been steadily creeping up - taking on the order of hours for some datasets.

After converting legacy Python code for a dataset into C++, we typically see processing times drop by a factor of 10 or so - the conversions have been a great success. But as more code was implemented in the C++ application, build times have also steadily increased, to the point of slowing down day-to-day development and CI.

The original architecture

At a high level, the high-level purpose of this application is pretty simple: given a particular dataset, run the code which processes that dataset to generate our desired output. The single executable therefore contains all the code needed to handle all datasets.

In main(), we had something akin to the following:

#include "dataset1.hpp"
#include "dataset2.hpp"
#include "dataset3.hpp"
// ...

void run_for_dataset(const std::string& dataset_name, ...) {
  if (dataset_name == "dataset1") {
    run_dataset_1();
  } else if (dataset_name == "dataset2") {
    run_dataset_2();
  } else if (dataset_name == "dataset3") {
    run_dataset_3();
  } // ...

  throw std::runtime_error{"Unknown dataset"};
}

int main(int argc, char* argv[]) {
  run_for_dataset(...);
}

Under the hood, each dataset processor is implemented using a heavily-templated class. Each class inherits a thin virtual base class, so we have a shared processing loop which uses dynamic dispatch on the processor object.

Templated classes in are typically defined and implemented in headers, so that's what developers tended to stick to. This resulted in tons of template instantiations within a single translation: the main source file.

Not only did this take a long time, consume gigabytes of memory (to the point where our build agents started failing), it's also wasteful for an individual developer's purposes. While developing for a specific dataset, one is typically only changing the implementation of and running on a single dataset. But each time the main executable is built for testing, there's a lot of recompiling and linking of irrelevant code for other datasets.

At some point, developers started locally commenting out the code for other datasets in main() to dramatically reduce their build times. But local patches are inherently fragile: producing merge conflicts, and introducing the risk that they are accidentally committed and included in a code review.

It was clear something had to be done. While I suggested moving template implementations into source files as much as possible, this wouldn't help the extreme link times, and was very tedious to do manually. While alternative linkers like mold are available, I haven't found it much help in the build time department. Ideally, we'd have a system that would allow building the implementation(s) for just the required dataset(s).

Other developers had started using a script which automatically edited main() for you, while I started on a build system solution. This ended up taking much longer to merge than I expected.

A simple run-time module system

To summarise, I was aiming for was a solution which:

  1. completely avoided compiling & linking code for unwanted datasets
  2. didn't require any local code changes to select the desired datasets
  3. supported building shared libraries rather than just static, in case that helped build times
  4. was reasonably straightforward to use

CMake is our build system, so checking number 4) may be inherently difficult to achieve. But my first thought was using CMake build options to pick which dataset targets the final executable should depend on. This would trivially check the first box, as a CMake build tool will only build and link targets which are required by the target being built. This should also make 3) easy enough - we should be able to trivially swap the module targets from STATIC to SHARED libraries.

The question then is how to solve 2). Obviously the big if block had to go, because we don't want to #include the code for each dataset, and not all of them will be available. What we needed was a sort of plugin system, where each plugin would automatically register itself when it was compiled as part of the application.

I struggle to recall exactly where I got this pattern from, but in the end I opted for a system which made use of a static initialisation:

  1. Create a static global map
  2. Create a class which, in its constructor, inserts into that static global map information about a specific "module" which handles a dataset
  3. Create a static instance of the above class for each module, particularly including a function for initialising the module

So by the time we hit main(), all of the static objects for each module should be initialised, populating the global map. To run a specific module, we search that map for the name we're looking for, and execute the function to initialise and/or run it.

I've implemented a simple cut-down version of this system in this Git repository.

Hopefully the code should be mostly self-explanatory, but I'll step through it below.

Building & running the example project

$ ./build/src/program
Usage: ./build/src/program <module name>

Available modules:
  module1
    Module 1 does some stuff
  module2
    Module 2 does some other stuff

$ ./build/src/program module1
Module 1 has done some stuff

$ ./build/src/program module2
Module 2 has done some stuff

Build with all modules:

$ cmake -B build -DENABLED_MODULES= >/dev/null \
  && cmake --build build \
  && ./build/src/program
ninja: no work to do.
Usage: ./build/src/program <module name>

Available modules:
  module1
    Module 1 does some stuff
  module2
    Module 2 does some other stuff

Enable just module1:

$ cmake -B build -DENABLED_MODULES=module1 >/dev/null \
  && cmake --build build \
  && ./build/src/program
[2/2] Linking CXX executable src/program
Usage: ./build/src/program <module name>

Available modules:
  module1
    Module 1 does some stuff

Enable both modules:

$ cmake -B build -DENABLED_MODULES='module1;module2' >/dev/null \
  && cmake --build build \
  && ./build/src/program
[2/2] Linking CXX executable src/program
Usage: ./build/src/program <module name>

Available modules:
  module1
    Module 1 does some stuff
  module2
    Module 2 does some other stuff

Trying to enable a nonexistent module:

$ cmake -B build -DENABLED_MODULES=invalidmodule >/dev/null \
  && cmake --build build \
  && ./build/src/program
CMake Error at src/CMakeLists.txt:26 (message):
  ENABLED_MODULES contains unknown module 'invalidmodule'.  Full list:
  module1;module2

Building shared library modules with BUILD_SHARED_LIBS:

$ cmake -B build -DENABLED_MODULES= -DBUILD_SHARED_LIBS=ON >/dev/null \
  && cmake --build build \
  && ./build/src/program
ninja: no work to do.
Usage: ./build/src/program <module name>

Available modules:
  module1
    Module 1 does some stuff
  module2
    Module 2 does some other stuff

Module registration

The secret sauce is in the module_registration library.

module_registration.hpp
1#pragma once
2
3#include <string>
4#include <map>
5#include <memory>
6#include <functional>
7
8/**
9 * Base class for all modules.
10 */
11struct BaseModule {
12 BaseModule(std::string parameter) : parameter_{std::move(parameter)} {};
13 virtual ~BaseModule() = default;
14
15 virtual int process() = 0;
16
17private:
18 std::string parameter_{};
19};
20
21/**
22 * Stuff about a module, including how to initialise it.
23 */
24struct ModuleInfo {
25 using Factory = std::function<std::unique_ptr<BaseModule>(std::string)>;
26
27 std::string name;
28 std::string description;
29 Factory factory;
30};
31
32using ModuleRegistry = std::map<std::string, ModuleInfo, std::less<>>;
33
34/**
35 * Add a module to the global registry.
36 */
37void register_module(ModuleInfo&& meta);
38
39
40/**
41 * Get all available modules
42 */
43auto get_modules() -> const ModuleRegistry&;
44
45/**
46 * Utility class for registering a module at static initialisation.
47 *
48 * C++ doesn't (yet) natively support running arbitrary code at static
49 * initialisation, so we use a class constructor as a simple workaround.
50 */
51struct RegisterModule {
52 RegisterModule(ModuleInfo info) {
53 register_module(std::move(info));
54 }
55};

In the header we provide BaseModule, which as the name suggests, is a base class for all modules.

After it, we define a ModuleInfo which contains some meta information for a module - in particular, how to initialise it in order to return a BaseModule&. Of course, the factory could be replaced by a function which actually goes off and does the processing, but this system is a bit closer to our architecture. It also means that we can run a lot of the common stuff separately, and call into the module primarily for its main processing loop.

We then provide a couple of methods for accessing the global map - from here one referred to as the "registry".

register_module() adds a module to the registry. get_modules() returns a constant reference to the registry.

module_registration.cpp
1#include "module_registration.hpp"
2
3#include <cstdio>
4#include <fmt/core.h>
5#include <fmt/format.h>
6
7namespace {
8
9// Use a static within a function as the global registry, to ensure it is
10// always initialised before use.
11auto get_registry() -> ModuleRegistry& {
12 static auto registry = ModuleRegistry{};
13 return registry;
14}
15
16}
17
18void register_module(ModuleInfo&& info) {
19 const auto [it, inserted] = get_registry().try_emplace(info.name, info);
20 if (not inserted) {
21 fmt::println(stderr, "Warning: duplicate module registered: {}", it->first);
22 }
23}
24
25auto get_modules() -> const ModuleRegistry & {
26 return get_registry();
27}

In the source file, we primarily have our global map, which is returned by get_registry(). Notably, making it a static variable inside a function ensures it's always initialised before use.

Implementing a module

module1.hpp
1#include "module_registration/module_registration.hpp"
2
3#include <fmt/format.h>
4
5namespace module1 {
6
7/**
8 * Module 1 does some stuff.
9 */
10class Module1 : public BaseModule {
11public:
12 Module1(std::string parameter) : BaseModule{std::move(parameter)} {}
13
14 int process() override {
15 fmt::println("Module 1 has done some stuff");
16 return 1;
17 }
18};
19
20} // namespace module1

Lets say we have a module which is implemented like so.

Within the source file, we register it with a static RegisterModule object:

module1_main.cpp
1#include <fmt/format.h>
2
3#include "module1/module1.hpp"
4#include "module_registration/module_registration.hpp"
5
6namespace module1 {
7
8[[maybe_unused]] static auto registrar_ = RegisterModule(ModuleInfo{
9 .name = "module1",
10 .description = "Module 1 does some stuff",
11 .factory = [](std::string parameter) -> std::unique_ptr<BaseModule> {
12 return std::make_unique<Module1>(std::move(parameter));
13 },
14});
15
16} // namespace module1

The name of the variable doesn't matter of course. I've set [[maybe_unused]] to prevent static analysers complaining about the used symbol - they can't determine its purpose, after all.

main

Finally, we come to main(). Its purpose is to initialise and run the right module. I've implemented some basic command-line handling to make the program basically functional.

main.cpp
1#include <cstdlib>
2#include <fmt/core.h>
3#include <fmt/format.h>
4#include <fmt/ranges.h>
5
6#include "module_registration/module_registration.hpp"
7
8int main(int argc, char *argv[]) {
9 if (argc < 2) {
10 fmt::println("Usage: {} <module name>", argv[0]);
11 fmt::println("");
12 fmt::println("Available modules:");
13 for (const auto &[name, module] : get_modules()) {
14 fmt::println(" {}", name);
15 fmt::println(" {}", module.description);
16 }
17
18 return EXIT_SUCCESS;
19 }
20
21 if (auto result = get_modules().find(argv[1]);
22 result != get_modules().end())
23 {
24 auto instance = result->second.factory("Some parameter");
25 return instance->process();
26 } else {
27 fmt::println(stderr, "Unknown module '{}'", argv[1]);
28 return EXIT_FAILURE;
29 }
30}

In a fully-fledged application, the command-line handling would naturally be much more complex. In our application, for example, most of the command-line parameters are passed into the module's factory.

Build configuration glue

Here's a snippet of the fun bit: the CMake build configuration.

CMakeLists.txt
1set(CMAKE_CXX_STANDARD 20)
2
3include(${CMAKE_CURRENT_LIST_DIR}/cmake/add_module_test.cmake)
4include(${CMAKE_CURRENT_LIST_DIR}/cmake/add_module.cmake)
5
6add_subdirectory(module_registration)
7
8add_subdirectory(module1)
9add_subdirectory(module2)
10
11set(ALL_MODULES
12 module1
13 module2
14)
15set(ENABLED_MODULES ${ALL_MODULES} CACHE STRING
16 "Semicolon-separated list of enabled modules. Possible values: ${ALL_MODULES}"
17)
18
19# If the user leaves the option blank, build everything
20if (NOT ENABLED_MODULES)
21 set(ENABLED_MODULES ${ALL_MODULES})
22endif()
23
24foreach(module IN LISTS ENABLED_MODULES)
25 if (NOT module IN_LIST ALL_MODULES)
26 message(FATAL_ERROR "ENABLED_MODULES contains unknown module '${module}'. "
27 "Full list: ${ALL_MODULES}")
28 endif()
29endforeach()
30
48add_executable(program
49 main.cpp
50)
51
52target_link_libraries(program
53PRIVATE
54 ${ENABLED_MODULES}
55 fmt::fmt
56 module_registration
57)

As initially described, it creates a build option ENABLED_MODULES which lists the CMake targets (one for each module) which is linked to the main executable.

The list is semicolon-separated because that's how CMake does things. Some extra code would allow more sensible comma or space separated values. I've implemented some basic validation to ensure the specified targets are valid.

Weird things and lessons learnt

If you browse the CMake configuration in the example project, you'll notice it's a bit more complex than I made it out to be in this article.

There's also a couple of commented out workarounds for issues I encountered, at least the version of CMake we were using at the time.

Keeping the static registration objects in the libraries

After implementing the initial proof of concept, the first hurdle I ran into was a pretty confusing one. None of the modules were registering themselves at static init. The RegisterModule constructor wasn't being called, and I couldn't set a breakpoint on the static registrar_ variable within the module itself.

I eventually realised that the static variables weren't present in the final executable, or in the module library, which I could check with nm --demangle. At that point, I was building each module as a static library. That's the default type if BUILD_SHARED_LIBS is OFF.

This seems to be default behaviour of GCC and the GNU linker when they encounter static objects that are apparently unused.

I found quite a few discussions on how to work around this, including

With some suggestions including:

Messing with the link flags as in the latter solution isn't particularly easy in CMake. but thankfully Object libraries are functionally equivalent. This has the added benefit of being portable, rather than relying on linker-specific options.

That's the solution I ended up going with: compile the module libraries as OBJECT libraries. I haven't encountered any such issue when the modules are built as shared libraries, so there's no workaround required when BUILD_SHARED_LIBS=ON.

add_module.cmake
1if (BUILD_SHARED_LIBS)
2 # With shared objects, both modules and the registration library must be
3 # shared to ensure the same global registry is being mutated.
4 set(MODULE_LIBRARY_TYPE SHARED)
5else()
6 # Otherwise, we build individual modules as OBJECT libraries, so that the
7 # apparently unused static RegisterModule symbol is not omitted.
8 set(MODULE_LIBRARY_TYPE OBJECT)
9endif()

Resolving code coverage metrics

After the module system was functioning and applied to all of our existing modules, I noticed that code coverage metrics were significantly lower compared to the master branch.

We collect these code metrics using Gcov, and generate a nice report using gcovr.

I think this is a bug in gcov. I found this mailing list discussion which produced a patch:

Should templates with multiple instantiations contribute to summaries in gcov?

In the meantime, I worked out that we could use an intermediate static library between the module being tested and the final test executable. This essentially undoes the above workaround to keep the "unused" static registration objects in the library. I guess that also means the template instantiations which are actually unused by the test executable linger around and affect the coverage statistics.

This workaround appears to have no impact on metrics in the example project (tested with GCC 13), but I left it commented out in the add_module_test function to show how it'd work:

add_module_test.cmake
1# params <module name> [source files, ...]
2function(add_module_test module_target)
3 set(sources ${ARGV})
4 list(REMOVE_AT sources 0)
5
6 set(options --coverage)
7
8 # The following workaround may be necessary if your code coverage metrics
9 # seem lower than expected. I think this is due to gcov not accounting for
10 # multiple template instantiations in its metrics:
11 # https://www.mail-archive.com/gcc%40gcc.gnu.org/msg98590.html
12 # Add an intermediate static (or shared) library to remove the superfluous
13 # symbols from the module library.
14
15 # add_library(${module_target}_test_library)
16 # target_link_libraries(${module_target}_test_library PUBLIC ${module_target})
17
18 # target_compile_options(${module_target}_test_library PUBLIC ${options})
19 # target_link_options(${module_target}_test_library PUBLIC ${options})
20 # target_link_libraries(${module_target}_test
21 # ${module_target}_test_library
22 # )
23
24 add_executable(${module_target}_test ${sources})
25
26 target_link_libraries(${module_target}_test
27 GTest::gtest_main
28 ${module_target}
29 )
30
31 target_compile_options(${module_target}_test PUBLIC ${options})
32 target_link_options(${module_target}_test PUBLIC ${options})
33
34 gtest_discover_tests(${module_target}_test)
35endfunction()

Shared libraries haven't helped build times

Also after implementing and porting all the modules to the system, I did some crude benchmarking on my development laptop with time. I tested the following combinations with both static and shared libraries.

Unsurprisingly, the single module, and particularly incremental build, were much, much faster than when all modules are enabled.

Surprisingly, building with static libraries was consistently a bit faster than shared libraries by a few seconds, for a ~500MB executable with debug symbols. In theory, linking statically should be slower due to more code being copied into the executable, plus with link-time optimisation. But I'm not an expert, and my measurements didn't break down the build into compile and link times.

I'm sure this all varies heavily with the compiler and linker used anyway.

A lesson in gradual migration

It took a lot of time working on this on and off. From getting an initial proof of concept working, to converting a handful of modules to the new system, to getting the go ahead to convert all the existing modules. Then the final push, where we tested it thoroughly in a lower environment, then merged and got it deployed. We had no trouble with the deployment, and the new system remains ever since.

Part of the long turnaround was from finding the time alongside all the work that needed to be done.

In hindsight, there was no real reason we had to convert all the module in one go. Certainly we get the most benefit from having all modules immediately optional, but that meant maintaining a branch over a long period, requiring many painful rebases.

A gradual implementation would have been quite simple:

  1. search for a module using the mechanism described in this article, and use it if found, otherwise
  2. fall back to the big if described at the start of the article

It would have served as an early proof of concept to convert a single module, test, and deploy it, then gradually extend it to each module. It may have taken longer to see build time improvements, but perhaps the effort could've been shared amongst the team.