A simple C++ module system with CMake
27 November 2024Implementing a module/plugin system for a C++ application for faster compile times.
One way to implement a simple module/plugin system in C++ using CMake.
By module or plugin, I mean a replaceable software components, where each serves a similar function, but with a different implementation. Not C++ 20 modules.
Background
In my current role, my team and I have been in the gradual process of converting a lot of Python code into C++. The software handles large volumes of time-series datasets, and is consequently difficult to parallelise. With the volume of data increasing over time, the processing time of the Python processors had been steadily creeping up - taking on the order of hours for some datasets.
After converting legacy Python code for a dataset into C++, we typically see processing times drop by a factor of 10 or so - the conversions have been a great success. But as more code was implemented in the C++ application, build times have also steadily increased, to the point of slowing down day-to-day development and CI.
The original architecture
At a high level, the high-level purpose of this application is pretty simple: given a particular dataset, run the code which processes that dataset to generate our desired output. The single executable therefore contains all the code needed to handle all datasets.
In main()
, we had something akin to the following:
// ...
void
int
Under the hood, each dataset processor is implemented using a heavily-templated
class. Each class inherits a thin virtual
base class, so we have a shared
processing loop which uses dynamic dispatch on the processor object.
Templated classes in are typically defined and implemented in headers, so that's what developers tended to stick to. This resulted in tons of template instantiations within a single translation: the main source file.
Not only did this take a long time, consume gigabytes of memory (to the point where our build agents started failing), it's also wasteful for an individual developer's purposes. While developing for a specific dataset, one is typically only changing the implementation of and running on a single dataset. But each time the main executable is built for testing, there's a lot of recompiling and linking of irrelevant code for other datasets.
At some point, developers started locally commenting out the code for other
datasets in main()
to dramatically reduce their build times. But local
patches are inherently fragile: producing merge conflicts, and introducing the
risk that they are accidentally committed and included in a code review.
It was clear something had to be done. While I suggested moving template implementations into source files as much as possible, this wouldn't help the extreme link times, and was very tedious to do manually. While alternative linkers like mold are available, I haven't found it much help in the build time department. Ideally, we'd have a system that would allow building the implementation(s) for just the required dataset(s).
Other developers had started using a script which automatically edited main()
for you, while I started on a build system solution. This ended up taking much
longer to merge than I expected.
A simple run-time module system
To summarise, I was aiming for was a solution which:
- completely avoided compiling & linking code for unwanted datasets
- didn't require any local code changes to select the desired datasets
- supported building shared libraries rather than just static, in case that helped build times
- was reasonably straightforward to use
CMake is our build system, so checking number 4) may be inherently difficult to
achieve. But my first thought was using CMake build options to pick which
dataset targets the final executable should depend on. This would trivially
check the first box, as a CMake build tool will only build and link targets
which are required by the target being built. This should also make 3) easy
enough - we should be able to trivially swap the module targets from STATIC
to
SHARED
libraries.
The question then is how to solve 2). Obviously the big if
block had to go,
because we don't want to #include
the code for each dataset, and not all of
them will be available. What we needed was a sort of plugin system, where each
plugin would automatically register itself when it was compiled as part of the
application.
I struggle to recall exactly where I got this pattern from, but in the end I opted for a system which made use of a static initialisation:
- Create a static global map
- Create a class which, in its constructor, inserts into that static global map information about a specific "module" which handles a dataset
- Create a static instance of the above class for each module, particularly including a function for initialising the module
So by the time we hit main()
, all of the static objects for each module should
be initialised, populating the global map. To run a specific module, we search
that map for the name we're looking for, and execute the function to initialise
and/or run it.
I've implemented a simple cut-down version of this system in this Git repository.
Hopefully the code should be mostly self-explanatory, but I'll step through it below.
Building & running the example project
$ ./build/src/program
Usage: ./build/src/program <module name>
Available modules:
module1
Module 1 does some stuff
module2
Module 2 does some other stuff
$ ./build/src/program module1
Module 1 has done some stuff
$ ./build/src/program module2
Module 2 has done some stuff
Build with all modules:
$ cmake -B build -DENABLED_MODULES= >/dev/null \
&& cmake --build build \
&& ./build/src/program
ninja: no work to do.
Usage: ./build/src/program <module name>
Available modules:
module1
Module 1 does some stuff
module2
Module 2 does some other stuff
Enable just module1
:
$ cmake -B build -DENABLED_MODULES=module1 >/dev/null \
&& cmake --build build \
&& ./build/src/program
[2/2] Linking CXX executable src/program
Usage: ./build/src/program <module name>
Available modules:
module1
Module 1 does some stuff
Enable both modules:
$ cmake -B build -DENABLED_MODULES='module1;module2' >/dev/null \
&& cmake --build build \
&& ./build/src/program
[2/2] Linking CXX executable src/program
Usage: ./build/src/program <module name>
Available modules:
module1
Module 1 does some stuff
module2
Module 2 does some other stuff
Trying to enable a nonexistent module:
$ cmake -B build -DENABLED_MODULES=invalidmodule >/dev/null \
&& cmake --build build \
&& ./build/src/program
CMake Error at src/CMakeLists.txt:26 (message):
ENABLED_MODULES contains unknown module 'invalidmodule'. Full list:
module1;module2
Building shared library modules with BUILD_SHARED_LIBS
:
$ cmake -B build -DENABLED_MODULES= -DBUILD_SHARED_LIBS=ON >/dev/null \
&& cmake --build build \
&& ./build/src/program
ninja: no work to do.
Usage: ./build/src/program <module name>
Available modules:
module1
Module 1 does some stuff
module2
Module 2 does some other stuff
Module registration
The secret sauce is in the module_registration
library.
module_registration.hpp
1 2
3 4 5 6 7
8 /**
9 * Base class for all modules.
10 */
11 12 ;
13 virtual ;
14
15 virtual int ;
16
17 private:
18 std::string ;
19 ;
20
21 /**
22 * Stuff about a module, including how to initialise it.
23 */
24 25 using Factory = std::function<std::unique_ptr<BaseModule>>;
26
27 std::string name;
28 std::string description;
29 Factory factory;
30 ;
31
32 using ModuleRegistry = std::map<std::string, ModuleInfo, std::less<>>;
33
34 /**
35 * Add a module to the global registry.
36 */
37 void ;
38
39
40 /**
41 * Get all available modules
42 */
43 auto const ModuleRegistry&;
44
45 /**
46 * Utility class for registering a module at static initialisation.
47 *
48 * C++ doesn't (yet) natively support running arbitrary code at static
49 * initialisation, so we use a class constructor as a simple workaround.
50 */
51 52 53 ;
54
55 ;
In the
header
we provide BaseModule
, which as the name suggests, is a base class for all
modules.
After it, we define a ModuleInfo
which contains some meta information for a
module - in particular, how to initialise it in order to return a BaseModule&
.
Of course, the factory
could be replaced by a function which actually goes off
and does the processing, but this system is a bit closer to our architecture. It
also means that we can run a lot of the common stuff separately, and call into
the module primarily for its main processing loop.
We then provide a couple of methods for accessing the global map - from here one referred to as the "registry".
register_module()
adds a module to the registry. get_modules()
returns a
constant reference to the registry.
module_registration.cpp
1 2
3 4 5 6
7 8
9 // Use a static within a function as the global registry, to ensure it is
10 // always initialised before use.
11 auto ModuleRegistry& 12 static auto registry = ;
13 return registry;
14
15
16
17
18 void 19 const auto = .;
20 if 21 ;
22 }
23
24
25 auto const ModuleRegistry & 26 return ;
27
In the source file, we primarily have our global map, which is returned by
get_registry()
. Notably, making it a static variable inside a function ensures
it's always initialised before use.
Implementing a module
module1.hpp
1 2
3 4
5 6
7 /**
8 * Module 1 does some stuff.
9 */
10 11 public:
12
13
14 int 15 ;
16 return 1;
17
18 ;
19
20 // namespace module1
Lets say we have a module which is implemented like so.
Within the source file, we register it with a static RegisterModule
object:
module1_main.cpp
1 2
3 4 5
6 7
8 static auto registrar_ = 9 .name = "module1",
10 .description = "Module 1 does some stuff",
11 .factory = -> std:: 12 return ;
13 },
14 ;
15
16 // namespace module1
The name of the variable doesn't matter of course. I've set [[maybe_unused]]
to prevent static analysers complaining about the used symbol - they can't
determine its purpose, after all.
main
Finally, we come to main()
. Its purpose is to initialise and run the right
module. I've implemented some basic command-line handling to make the program
basically functional.
main.cpp
1 2 3 4 5
6 7
8 int 9 if 10 ;
11 ;
12 ;
13 for 14 ;
15 ;
16 }
17
18 return EXIT_SUCCESS;
19 }
20
21 if 22 result != .)
23 24 auto instance = result->second.;
25 return instance->;
26 } else 27 ;
28 return EXIT_FAILURE;
29 }
30
In a fully-fledged application, the command-line handling would naturally be much more complex. In our application, for example, most of the command-line parameters are passed into the module's factory.
Build configuration glue
Here's a snippet of the fun bit: the CMake build configuration.
CMakeLists.txt
1
2
3
4
5
6
7
8
9
10
11 12
13
14 )
15 16 "Semicolon-separated list of enabled modules. Possible values: "
17 )
18
19 # If the user leaves the option blank, build everything
20 21
22
23
24 25 26 27 "Full list: ")
28
29
30
48 49
50 )
51
52 53 PRIVATE
54
55
56
57 )
As initially described, it creates a build option ENABLED_MODULES
which lists
the CMake targets (one for each module) which is linked to the main executable.
The list is semicolon-separated because that's how CMake does things. Some extra code would allow more sensible comma or space separated values. I've implemented some basic validation to ensure the specified targets are valid.
Weird things and lessons learnt
If you browse the CMake configuration in the example project, you'll notice it's a bit more complex than I made it out to be in this article.
There's also a couple of commented out workarounds for issues I encountered, at least the version of CMake we were using at the time.
Keeping the static registration objects in the libraries
After implementing the initial proof of concept, the first hurdle I ran into was
a pretty confusing one. None of the modules were registering themselves at
static init. The RegisterModule
constructor wasn't being called, and I
couldn't set a breakpoint on the static registrar_
variable within the module
itself.
I eventually realised that the static variables weren't present in the final
executable, or in the module library, which I could check with nm --demangle
.
At that point, I was building each module as a static library. That's the
default type if
BUILD_SHARED_LIBS
is OFF
.
This seems to be default behaviour of GCC and the GNU linker when they encounter static objects that are apparently unused.
I found quite a few discussions on how to work around this, including
- Static Variables Initialization in a Static Library, Example (C++ Stories, 2018)
- How to force gcc to link an unused static library (Stack Overflow, 2013)
- prevent gcc from removing an unused variable (Stack Overflow, 2015)
With some suggestions including:
-
Adding the GCC attribute
__attribute__((used))
toregistrar_
.
With this,registar_
is included in the module static library, but not in the final executable. -
Linking the static library with the
-whole-archive
option.
Messing with the link flags as in the latter solution isn't particularly easy in CMake. but thankfully Object libraries are functionally equivalent. This has the added benefit of being portable, rather than relying on linker-specific options.
That's the solution I ended up going with: compile the module libraries as
OBJECT
libraries.
I haven't encountered any such issue when the modules are built as shared
libraries, so there's no workaround required when BUILD_SHARED_LIBS=ON
.
add_module.cmake
1 2 # With shared objects, both modules and the registration library must be
3 # shared to ensure the same global registry is being mutated.
4
5 6 # Otherwise, we build individual modules as OBJECT libraries, so that the
7 # apparently unused static RegisterModule symbol is not omitted.
8
9
Resolving code coverage metrics
After the module system was functioning and applied to all of our existing modules, I noticed that code coverage metrics were significantly lower compared to the master branch.
We collect these code metrics using Gcov, and generate a nice report using gcovr.
I think this is a bug in gcov. I found this mailing list discussion which produced a patch:
Should templates with multiple instantiations contribute to summaries in gcov?
In the meantime, I worked out that we could use an intermediate static library between the module being tested and the final test executable. This essentially undoes the above workaround to keep the "unused" static registration objects in the library. I guess that also means the template instantiations which are actually unused by the test executable linger around and affect the coverage statistics.
This workaround appears to have no impact on metrics in the example project
(tested with GCC 13), but I left it commented out in the add_module_test
function to show how it'd work:
add_module_test.cmake
1 # params <module name> [source files, ...]
2 3
4
5
6
7
8 # The following workaround may be necessary if your code coverage metrics
9 # seem lower than expected. I think this is due to gcov not accounting for
10 # multiple template instantiations in its metrics:
11 # https://www.mail-archive.com/gcc%40gcc.gnu.org/msg98590.html
12 # Add an intermediate static (or shared) library to remove the superfluous
13 # symbols from the module library.
14
15 # add_library(${module_target}_test_library)
16 # target_link_libraries(${module_target}_test_library PUBLIC ${module_target})
17
18 # target_compile_options(${module_target}_test_library PUBLIC ${options})
19 # target_link_options(${module_target}_test_library PUBLIC ${options})
20 # target_link_libraries(${module_target}_test
21 # ${module_target}_test_library
22 # )
23
24
25
26 27
28
29 )
30
31
32
33
34
35
Shared libraries haven't helped build times
Also after implementing and porting all the modules to the system, I did some
crude benchmarking on my development laptop with time
. I tested the following
combinations with both static and shared libraries.
- a clean build with all modules enabled
- an incremental build with all modules enabled
- a clean build with a single module enabled
- an incremental build with a single module enabled
Unsurprisingly, the single module, and particularly incremental build, were much, much faster than when all modules are enabled.
Surprisingly, building with static libraries was consistently a bit faster than shared libraries by a few seconds, for a ~500MB executable with debug symbols. In theory, linking statically should be slower due to more code being copied into the executable, plus with link-time optimisation. But I'm not an expert, and my measurements didn't break down the build into compile and link times.
I'm sure this all varies heavily with the compiler and linker used anyway.
A lesson in gradual migration
It took a lot of time working on this on and off. From getting an initial proof of concept working, to converting a handful of modules to the new system, to getting the go ahead to convert all the existing modules. Then the final push, where we tested it thoroughly in a lower environment, then merged and got it deployed. We had no trouble with the deployment, and the new system remains ever since.
Part of the long turnaround was from finding the time alongside all the work that needed to be done.
In hindsight, there was no real reason we had to convert all the module in one go. Certainly we get the most benefit from having all modules immediately optional, but that meant maintaining a branch over a long period, requiring many painful rebases.
A gradual implementation would have been quite simple:
- search for a module using the mechanism described in this article, and use it if found, otherwise
- fall back to the big
if
described at the start of the article
It would have served as an early proof of concept to convert a single module, test, and deploy it, then gradually extend it to each module. It may have taken longer to see build time improvements, but perhaps the effort could've been shared amongst the team.