A Journey into Nix and NixOS
21 October 2022I chronicle my introduction to and experiments with using a rather radical GNU/Linux operating system on my webserver
Before I started this journey, I ran this website and a few personal services on the cheapest VPS OVH offer, but the limited disk space has frequently given be trouble. 20GB isn't a lot to work with when you're using it to synchronise and stash personal files. I planned to take cancel its upcoming renewal, and instead migrate to another VPS provider that provides a bit better value for money. However, this introduces the problem of manually migrating and setting things up again. Naturally, having done most of the setup over a year ago and I'd forgotten most of it. I wasn't keen on repeating that mistake: I wanted a proper automated deployment method. One that doesn't rely on me doing everything manually, or rely on all the knowledge staying in my head.
While Ansible and similar are the "industry-standard" tools for doing this. I've seen Ansible used at a previous job, and witnessed how slow it can be, and now messy it can get with its piles of YAML. Needless to say, I wasn't left with the best impressions, and wasn't too keen on it for personal use. I'd came across Nix online through articles and discussions, including Xe Iaso's blogs and talks, and the Self Hosted podcast. Its approach sounded novel and interested me, so I took the server migration as an opportunity to experiment with it.
I wrote most of this article as I went along, and it chronicles most of my trial-and-error in creating a NixOS. It's mostly a stream of conciousness, serving primarily as a record of what I struggled with and thoughts that I had along the way. In a subsequent article, I'll write up my perspective on Nix and NixOS and what I'd like to see from it in the future. That one will hopefully be less ramble-y and more interesting to the average reader.
It's been a long road, I've been working on it on and off for months now, starting in May 2022 and continuing until just before this article was published. Thankfully I consider this experiment a success, my data and services have all been migrated to the new server before my old VPS was up for renewal renewal, as I'd planned. It's been quite messy and frustrated at times, but I'm quite happy with how things turned out. I think all this work is a "once and for all" effort to ensure I don't have to manually set up my server again. I'm also reasonably happy with Nix, and I think I'll be using it more in the future.
If you're interested in viewing the end result (i.e. my NixOS configuration), you can find it in my configuration repository.
Preparatory reading
As with most things, I initially started with some web searches, and came across How to Learn Nix, which I ended up reading through quite a lot of.
I additional came across Nix Pills, which I think I briefly skimmed, and the Nix Wiki, which I initially referred to frequently while getting started.
Installing on my Arch computer
Fairly straightforward, it's available in Arch's community repository, so I installed it through pacman. After hitting some errors, I followed the wiki's guide and added the nixpkgs-unstable channel.
Then looked at managing user config files. I found home-manager but sounded like way more than I need - I want to keep my dotfiles for portability when I can't use Nix, I just need a little Nix expression for installing the few config files that need to live outside of ~/.config as symlinks.
A user Nix config for installing applications would be neat, but I decided to postpone until I'm further along. I decided to make my first goal reproducing my webserver configuration with Nix. So on to setting up a playground environment before investing in another webserver.
Raspberry Pi
I have a Raspberry Pi free for some experimentation, so I thought I'd use it as a testbed in place of a real VPS.
The wiki article suggests downloading and writing the SD card, but it's a graphical installer. I don't have a micro HDMI for the Pi so I want a headless installation with SSH (which is what I used with Arch on ARM). I assume I need to build my own image as suggested in the wiki, including my ssh key. Initially Followed the wiki article, using cross-compilation.
{ ... }: {
nixpkgs.crossSystem.system = "aarch64-linux";
imports = [
<nixpkgs/nixos/modules/installer/sd-card/sd-image-aarch64.nix>
];
users.extraUsers.root.openssh.authorizedKeys.keys = [
"ssh-rsa ...."
];
}
What does the command mean? Strangely it wanted to rebuild a lot of
applications. Why isn't it downloading everything from the cache? First
roadblock was an error in liburing, but hmm this appears to be building fine on
hydra. I find the --keep-failed
option to actually inspect what's wrong. Some the
post install script referred to a nonexistent file. Tried adding the nixos-21.11
and nixpkgs-21.11-aarch64 channels - no change. Found the --dry-run
option - why
are the hashes on hydra different to my nix-build? (Was it because I didn't pass
--pure
?). I followed some of my previous reading material and managed to
override the liburing derivation using overrideAttrs
:
{ ... }: {
nixpkgs.crossSystem.system = "aarch64-linux";
imports = [
<nixpkgs/nixos/modules/installer/sd-card/sd-image-aarch64.nix>
];
nixpkgs.overlays = [
/*
(self: super:
{
liburing = super.liburing.overrideAttrs (attrs: {
postInstall = ''
# Copy the examples into $bin. Most reverse dependency of this package should
# reference only the $out output
mkdir -p $bin/bin
cp ./examples/io_uring-cp examples/io_uring-test $bin/bin
cp ./examples/link-cp $bin/bin/io_uring-link-cp
'';
});
})
*/
];
users.extraUsers.root.openssh.authorizedKeys.keys = [
"ssh-rsa ...."
];
}
Also try the undocumented --store
to override the store because my root drive is
running out of space. This causes issues later, so I clear up some space and use
the default /nix/store
instead
More building...
Why am I building the Linux kernel? And the AMD drivers and Nouveau? I don't
need those. I hit "no space on device" errors, my tmpfs is running out of space
building all of this stuff.
Find some GitHub issues that mention remounting /tmp
and resizing to 14GB.
Fine, I do that. (Later someone mentions that it actually uses $TMPFS
rather
than hardcoding /tmp
, so I could have used a different filesystem.)
Find the nix-store --read-log
option, which is quite neat. Before finding it,
I ran builds several times over after losing the tmux scrollback buffer (and
later increasing the tmux scrollback limit).
More building...
Make: error 2 during linking(?) At this point I was lurking in the NixOS on ARM channel, someone mentioned to someone else that cross compilation isn't reliable. Suggests the qemu-user method. Fine, time to try that.
I install qemu-user
and the binfmt
packages from the AUR, add this to
/etc/nix/nix.conf
:
extra-sandbox-paths = /usr/bin/qemu-aarch64-static
extra-platforms = aarch64-linux
Still get some errors. I try overriding nixpkgs.system
and system
, no dice.
Apparently --argstr system aarch64-linux
isn't working right now. Find another
undocumented option, --system
, which does appear to work.
Later I find out (from the chat?) the distributed SD card image does support
SSH, you just need to create /root/.ssh/authorized_keys
on the flashed system.
So that whole exercise was a waste of time.
I flash the SD card, plug Pi and Ethernet in, no green light, but lights on Ethernet. Pi debugging ensues, replugging the SD card before Ethernet eventually works. Pi shows up on the router's web UI, but isn't assigned an IP address. DHCPD issue? I have no idea, and I'm not sure how to debug. I give up, maybe I should try a VM, which should be simpler, right?
Building a QEMU image
I find a couple of guides for making VMs with Nix:
- https://gist.github.com/tarnacious/f9674436fff0efeb4bb6585c79a3b9ff
- https://gist.github.com/573/c1d73a4fd04b8f8ca63885393856f9ea
I also find a link to the VM Nix module, which as all the available options. I ideally want no graphical window, just a shell.
Why isn't nix-build -A vm '<nixpkgs/nixos>' --arg configuration "{ imports = [ <nixpkgs/nixos/modules/virtualisation/build-vm.nix> ]; }" -I nixos-config=vm.nix
working? Works if I put everything in vm.nix
and add the
build-vm.nix
import. Probably the configuration
argument being ignored.
VM template:
{ pkgs, ... }:
{
imports = [
<nixpkgs/nixos/modules/profiles/qemu-guest.nix>
<nixpkgs/nixos/modules/virtualisation/qemu-vm.nix>
];
config = {
system.stateVersion = "22.05";
fileSystems."/" = {
device = "/dev/disk/by-label/nixos";
fsType = "ext4";
autoResize = true;
};
boot = {
growPartition = true;
kernelParams = [ "console=ttyS0" ];
loader.grub.device = "/dev/vda";
loader.timeout = 0;
};
users.extraUsers.root.password = "";
services.getty.autologinUser = "root";
networking.hostName = "vm";
virtualisation = {
cores = 4;
graphics = false;
qemu.options = [ "-serial mon:stdio" ];
# Forward any needed ports
forwardPorts = [
{ from = "host"; host.port = 8080; guest.port = 80; }
];
};
nix.extraOptions = "
extra-experimental-features = nix-command flakes
";
};
}
Runs successfully with:
I'm not sure how to break down the
nix-build
call. Include<nixpkgs/nixos
andvm.nix
, evaluate the expressionvm
? What is thevm
attribute? Where does it come from?
Writing a Nix module for my webserver
Nextcloud
I first find the wiki page, but it's pretty barren. It doesn't even link to manual page. I follow the next search result and find the manual section.
Setup is mostly straight-forward, most of the time spent trying to create these secret files on the disk. Required a bit more exploration of the Nix language.
I initially think I could manage them by overriding options/attribtues when
calling nix-build
. However the nix-build
switch --option
appears to refer
to Nix options, i.e. things that go in /etc/nix/nix.conf
--arg
and
--argstr
appear to be function arguments, i.e. things specified in { arg }:
at the start of a module
Is there a point in exploring this approach, if it'll be in plaintext in the store anyway? In the next section, I look into NixOps and if/how it handles secrets sensibly.
Selecting Nextcloud apps is a bit janky, I was expecting it to just need the app name, but it wants you to manually specify the download URL, hash, and all that jazz manually. I assume its for reproducibility, and no one's created packages for Nextcloud apps yet.
Dealing with secrets
Sorting this out probably going to be unavoidable if I want to use Nix "properly", this is, running one command to completely reproduce a system with virtually zero manual work required.
I find the wiki article on the subject
Lots of options - none of them ideal and zero extra effort. Proper secret management within Nix and the Nix store has been on the back-burner for a long time: https://github.com/NixOS/nix/issues/8.
My ideal: hook into pass, my password manager, to retrieve my secrets. Let me specify the required ones in my configuration, and deploy them when I'm running NixOps or whatever, with appropriate (and configurable) file permissions. Let them be persistent so I don't have to redeploy them every time, and clean up older ones if I remove them from the config.
NixOps
Okay, the default NixOps key system looks fine. You can override where keys are
saved so I can create them in /secrets
or something. I'm thinking to use
NixOps to deploy to real servers anyway, so I'm gonna have to learn it at some
point. Looks like it has some features for delaying/restarting systemd units
when secrets change or are made available, which is handy.
What is up with the NixOps manual on nixos.org? It has very little info, is lacking context on how to write NixOps configuration files and load commands.
Proper manual appears to be on releases.nixos.org
Can I override the SSH identity file used or do I really need to adjust my ~/.ssh/config?
I type in the host name as I would if I were running ssh <host>
, but for some
reason not all SSH options are picked up. E.g. I need to set
deployment.targetPort
, otherwise NixOps tries to connect to port 22.
$ nixops deloy
is building stuff - is it using the version of nixpkgs I'm
running NixOps from? Do I need to create another deployment to reload the NixOps
network file?
No, it looks okay in the end. But hits some errors at the end of the deploy, I assume running post-update GRUB hooks?
vm> updating GRUB 2 menu...
vm> installing the GRUB 2 boot loader on /dev/vda...
vm> Installing for i386-pc platform.
vm> /nix/store/4bmcgmin11ixrn2ij2jjinlddcxql1g6-grub-2.06/sbin/grub-install: warning: File system `ext2' doesn't support embedding.
vm> /nix/store/4bmcgmin11ixrn2ij2jjinlddcxql1g6-grub-2.06/sbin/grub-install: warning: Embedding is not possible. GRUB can only be installed in this setup by using blocklists. However, blocklists are UNRELIABLE and their use is discouraged..
vm> /nix/store/4bmcgmin11ixrn2ij2jjinlddcxql1g6-grub-2.06/sbin/grub-install: error: will not proceed with blocklists.
vm> /nix/store/lc7cvzrgsxks87g3jnjbg8pg2pqn15vh-install-grub.pl: installation of GRUB on /dev/vda failed: No such file or directory
vm> error: Traceback (most recent call last):
File "/nix/store/4zh7crx1r2sizvpyb1c9h109yfimlzn2-python3-3.9.12-env/lib/python3.9/site-packages/nixops/deployment.py", line 893, in worker
raise Exception(
Exception: unable to activate new configuration (exit code 1)
Why is it installing for i386? My VM is x86_64. Why is detecting the filesystem
as ext2? /dev/vda
is ext4.
I assume there are errors from updating within the VM itself, and probably not NixOps' fault. It just happened to update some packages.
Further research: grub-install
is apparently trying to update an MBR partition
with UEFI stuff? Try to force the VM to use UEFI with GRUB by inspecting the
qemu
module.
I find a related option, and try setting virtualisation.useEFIBoot = true
: get
the following error:
qemu-kvm: -drive if=pflash,format=raw,unit=1,file=: A block device must be specified for "file"
Further inspection: there is a dependency between useEFIBoot
and
useBootLoader
(the latter defaults to false
)
I take a brief detour to enable KVM in my BIOS.
With the new changes, the VM just boots into the UEFI shell, instead of directly to the system.
Back to the normal VM. It's quicker now due to the KVM modules loaded, but of course back to the same NixOps error.
Maybe I should create VMs through NixOps instead, which will be better supported? How about a quick detour and using sops-nix instead?
(In hindsight, this detour was not quick at all)
sops-nix and Flakes
Again, not ideal because I'd prefer to just use passwords stored in pass
. But
I understand why sops is preferable when it comes to Nix - the encrypted secrets
can be used as an input to the system configuration. For now it seems like one
of the easier options to go for. I like that it apparently hooks into
nixos-rebuild
and the like.
Quick question, how do I use nixos-rebuild
when I'm not on NixOS? Turns out
it's packaged in nixpkgs, duh. I
run it with $ nix-shell -p nixos-rebuild
sops-nix requires the target host's SSH key which is rather annoying, since it means having to deploy after creating a machine. And some manual work involved in extracting the SSH public key of new machines.
Setup guide is rather convoluted and complex, due to multiple encryption methods: using age keys, converting GPG or ssh keys to age keys.
Try to copy sops-nix example, so I guess I'm trying out flakes now:
{
inputs = {
nixpkgs.url = "nixpkgs/nixpkgs-unstable";
sops-nix.url = "github:Mic92/sops-nix";
sops-nix.inputs.nixpkgs.follows = "nixpkgs"; # use the same nixpkgs from this flake's input, this seems hacky
};
outputs = { self, nixpkgs, sops-nix }: {
nixosConfigurations."vm" = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
./vm.nix
sops-nix.nixosModules.sops
];
};
};
}
Try to deploy with $ nixos-rebuild switch --target-host vm --flake '.#vm'
error: getting status of '/nix/store/ppm1c1rch1p6w260kpfxma6kww8wkfdf-source/nix/flake.nix': No such file or directory
Okay, turns out I need to git add flake.nix
for some reason.
error: cannot look up '<nixpkgs/nixos/modules/profiles/qemu-guest.nix>' in pure evaluation mode (use '--impure' to override)
I guess I need to explicitly add the nixpkgs flake?
Do flakes change the way imports work? How to I translate a <nixpkgs/...>
import into an expression?
Now I try to rebuild my VM, but now I'm getting a new error:
error: The option `sops' does not exist. Definition values:
- In `/home/william/.config/nix/vm.nix':
{
defaultSopsFile = /home/william/.config/nix/secrets/common.yaml;
secrets = {
"nextcloud/admin" = { };
"nextcloud/database" = { };
Durr, this is because I'm still running nix-build
but I'm using the sops
option in my configuration. I assume I need to convert this to the new nix build
command to make use of Flakes? Let's give this a shot.
The VM
module
creates system.build.vm
, which is probably what I'm currently invoking with
nix-build
. The nix build --help
has the following example:
Build a NixOS system configuration from a flake, and make a profile point to the result:
# nix build --profile /nix/var/nix/profiles/system \
~/my-configurations#nixosConfigurations.machine.config.system.build.toplevel
That looks pretty similar! Surely I can change ~/my-configurations
and
machine
appropriately, replace system.build.toplevel
with system.build.vm
,
and that's my new build command?
$ nix build .#nixosConfigurations.vm.config.system.build.vm
warning: Git tree '/home/william/.config' is dirty
error: cannot look up '<nixpkgs/nixos/modules/profiles/qemu-guest.nix>' in pure evaluation mode (use '--impure' to override)
at /nix/store/h7ncs2xdj5nmfwfrgnhxv4wr0q6v6fqd-source/nix/vm.nix:8:5:
7| imports = [
8| <nixpkgs/nixos/modules/profiles/qemu-guest.nix>
| ^
9| <nixpkgs/nixos/modules/virtualisation/qemu-vm.nix>
(use '--show-trace' to show detailed location information)
Progress? Now, how to import these nixpkgs modules... they're not exposed through the nixpkgs flake, are they?
The Flakes wiki page has a Making your evaluations pure section, this suggests I need to somehow download these nixpkgs modules that I'm trying to import? Can I import a module within a flake's repository directly? Surely the entire repository is already in the nix store anyway?
Yes!
I turns out that I've had a incorrect presumption. I don't grasp the reason but even in the case of being a flake input inputs.home-manager will still return the store path of the home-manager flake. Which means I can still use it as an ordinary non-flake input That's Great A comment by ykis-0-0 on reddit.com/r/NixOS
This hint led me to add a trace to within the outputs
function in
flake.nix
, and lo and behold, it just points to the store path for nixpkgs!
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
sops-nix.url = "github:Mic92/sops-nix";
sops-nix.inputs.nixpkgs.follows = "nixpkgs"; # nasty hack to use the same nixpkgs from this flake's input
};
outputs = { self, nixpkgs, sops-nix }: {
nixosConfigurations."vm" = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
(nixpkgs + "/nixos/modules/profiles/qemu-guest.nix")
(nixpkgs + "/nixos/modules/virtualisation/qemu-vm.nix")
./vm.nix
sops-nix.nixosModules.sops
];
};
};
}
Not ideal, since now the import has moved outside of the vm.nix
module which
actually needs these imports.
Question: Can I somehow move this import to inside the module?
(I found out the answer much later: Yes! Using specialArgs
)
Getting another error now:
$ nix build .#nixosConfigurations.vm.config.system.build.vm
warning: Git tree '/home/william/.config' is dirty
error: getting status of '/nix/store/2awpx4cw6r4y4zxd3rnx93xq4h2mb9l3-source/nix/common.nix': No such file or directory
(use '--show-trace' to show detailed location information)
If I explore this store, common.nix
is indeed missing, but most of the files
are there! Also, this store contains a snapshot of my entire Git repository,
most of which isn't needed by the Nix stuff. But that's suspicious - is it
because common.nix
isn't staged yet? Why indeed, it isn't. Is this what the
warning is about? Maybe that warning should also specify that only staged files
will be used during the build, and especially check special files like
flake.nix
, it's confusing for builds to immediately fail with an error that
doesn't make sense unless you know that it's copying Git-tracked files to the
Nix store. I have everything ignored by default in my Git repository (because it
lives in ~/.config
, which many applications dump files into), so I hit this
problem repeatedly.
Finally, success! I've now converted to flakes, time to test out sops-nix again.
I can see sops-nix trying to do something at the start, but since I've wiped my VM its ssh-key has changed:
setting up secrets...
sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key with fingerprint cb3e5e4ce4278da9a1bb7beb7ff790665581e7ad
/nix/store/6a7866mckid9n3s8bvanp5ag5fk2awyl-sops-install-secrets-0.0.1/bin/sops-install-secrets: Failed to decrypt '/nix/store/p2inyam4wdxgljisqx035y1i8f85npi5-common.yaml': Error getting data key: 0 successful groups required, got 0
Activation script snippet 'setupSecrets' failed (1)
Time to redo part of the sops-nix setup. I get why it's done this way - using the host's SSH key is a neat idea - but I don't like that it's a manual task that needs to be done after initially booting a system. Maybe it's more of a problem with these throwaway VMs that I'm using.
Updated the age key for the VM, then ran sops updatekeys <file>
. New error on
boot:
setting up secrets...
sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key with fingerprint cb3e5e4ce4278da9a1bb7beb7ff790665581e7ad
/nix/store/6a7866mckid9n3s8bvanp5ag5fk2awyl-sops-install-secrets-0.0.1/bin/sops-install-secrets: Failed to decrypt '/nix/store/zl4yjsw14if8wqrblnf1lk50y01zvl6w-common.yaml': Error getting data key: 0 successful groups required, got 0
Activation script snippet 'setupSecrets' failed (1)
Is sops-nix trying to use the host's RSA key? I thought the age method only supported the ed25519 key? My mistake: the sops-nix guide says to set the following:
sops.age.sshKeyPaths = [ "/etc/ssh/ssh_host_ed25519_key" ];
Also, I was accidentally using my actual host's SSH keys because I ran
$ ssh-keyscan localhost -p 2222
instead of $ ssh-keyscan -p 2222 localhost
.
That one's on ssh-keyscan
.
On top of that, ssh-keyscan
doesn't detect the ed25519 keys on the VM for some
reason, so I had to copy the public key manually and pipe it through to
ssh-to-age
. One last $ age updatekeys ...
, and now there are secrets!
setting up secrets...
sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key with fingerprint cb3e5e4ce4278da9a1bb7beb7ff790665581e7ad
...
$ ll /run/secrets/
total 4
drwxr-x--x 2 root keys 0 May 29 12:59 nextcloud
-r-------- 1 root root 30 May 29 12:59 smtp
Went back and amended Nextcloud configuration to use sops-nix instead of Nix options.
Finally, time to get back to the task at hand.
Quick diversion: setting up a Nix formatter
Initially running nix fmt
gives a somewhat helpful (but not exactly
user-friendly) error:
warning: Git tree '/home/william/.config' is dirty
error: flake 'git+file:///home/william/.config?dir=nix' does not provide attribute 'formatter.x86_64-linux'
It should probably indicate to check nix fmt --help
, because that page has a
few examples to copy-paste from. I chose nixpkgs-fmt
and added this to the
output
set in my flake.nix
as directed:
{
output = {
formatter.x86_64-linux = nixpkgs.legacyPackages.x86_64-linux.nixpkgs-fmt;
};
}
Making a package: systemd-failmsg
I use systemd-failmsg to get emails when a service fails on my webserver. I definitely want it on my new server, but it's as-yet unpackaged. So I guess I need to create a proper Nix package for it?
Let's see this wiki page It basically says "look for something similar in nixpkgs and copy that". Which isn't ideal. Some basic templates would be nice.
Okay, I've copied something that looks reasonable:
{ stdenv, lib, fetchFromGithub, ... }:
stdenv.mkDerivation rec {
pname = "systemd-failmsg";
version = "1.3";
src = fetchFromGithub {
owner = "dino-";
repo = pname;
rev = "vversion";
sha256 = "7a2a6cc9311f1370b1c295d3cf0428604e03ced08bb894e073cd58209f5ef537";
};
meta = with lib; {
homepage = "https://github.com/dino-/systemd-failmsg";
description = "systemd toplevel override which sends emails alerts when systemd services fail";
license = licenses.isc;
platforms = platforms.linux;
};
}
How to I build it? $ nix build --file ...
gives me:
error: anonymous function at /home/william/.config/nix/pkgs/systemd-failmessage.nix:1:1 called without required argument 'stdenv'
The wiki article doesn't go into detail on using $ nix build
to build a
package. Most of the existing documentation uses nix-build
instead, which
obviously doesn't make use of flakes.
Okay, a different approach: specifying the package in flake.nix
. The wiki page
has very little detail on how to use output.packages
other than specifying
that it's used by $ nix build .#<package>
. nixpkgs only uses legacyPackages
,
so I have to search elsewhere. I looked at the Flake
RFC, in which a comment mentions the
callPackage
pattern used by nixpkgs for its legacy packages, but within the
output
function of my flake nixpkgs
just refers to the store path. It took
me a bunch of searching until I found this
blog which mentions
actually importing nixpkgs
to provide the classic pkgs
input. Makes sense
once it's done in an example!
{
# ...
outputs = { self, nixpkgs, sops-nix }:
let
forAllSystems = f: nixpkgs.lib.genAttrs nixpkgs.lib.systems.flakeExposed (system: f system);
in
{
packages = forAllSystems (system: import ./pkgs {
pkgs = import nixpkgs { inherit system; };
} );
nixosConfigurations."vm" = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
(nixpkgs + "/nixos/modules/profiles/qemu-guest.nix")
(nixpkgs + "/nixos/modules/virtualisation/qemu-vm.nix")
./vm.nix
./common.nix
sops-nix.nixosModules.sops
];
};
formatter.x86_64-linux = nixpkgs.legacyPackages.x86_64-linux.nixpkgs-fmt;
};
}
I also copied the forAllSystems
function used by nixpkgs' flake.nix
so my
packages are cross-platform by default.
Tried $ nix build .#systemd-failmsg
, Encountered a very helpful error message!
error: You meant fetchFromGitHub, with a capital H
Given how specific this error is ("with a capital H"), this is probably hard-coded message for this specific misspelling of a function. However, it's an example of what error messages should look like in practice.
The most related section of the NixOS manual appears to be Adding custom
packages,
which dutifully links to the more details nixpkgs
manual which actually covers
packaging in detail. My skim-reading caused me to skip over this link - I
instead searched for mkDerviation
in the NixOS manual but found very little.
Nevertheless, I was able to piece the puzzle together from the NixOS manual
alone, working out my roadblocks:
- Implementing the
installPhase
attribute - Using
bash
as an input to run the package's install script (otherwise the script would result in "/bin/bash: bad interpreter: No such file or directory" - Using the
$out
variable as the install prefix (instead of simplyout
)-
The error message for this is a bit vague:
error: builder for '/nix/store/4k49hl64mvwijpxc5v53szqv5vjzkhw6-systemd-failmsg-1.3.drv' failed to produce output path for output 'out' at /nix/store/...
-
However, after these issues I find everything as expected under result/
. I'm
pleasantly surprised that Nix automatically substitutes the shebang in a script
which is part of the package - #!/bin/bash
is replaced with
#!/nix/store/.../bash
. Excellent, that spares me from some tedious patching.
I can now search for my package and even show its derivation, nifty!
$ nix search . systemd-failmsg
* packages.x86_64-linux.systemd-failmsg (1.3)
systemd toplevel override which sends emails alerts when systemd services fail
$ nix show-derviation .#systemd-failmsg
{
"/nix/store/l14p85r1cn2fix4sisr6c1cfmgzm4x3p-systemd-failmsg-1.3.drv": {
"outputs": {
"out": {
"path": "/nix/store/bxj557f73c5k8ayirj30qlh4c4lvwc5w-systemd-failmsg-1.3"
}
},
"inputSrcs": [
"/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"
],
"inputDrvs": {
"/nix/store/3i6jqp61ra3031kjs1jlrmmqd2jyixd6-source.drv": [
"out"
],
"/nix/store/42pr7zqjf0y29v19q1wxn6hs5gdl5car-bash-5.1-p16.drv": [
"dev",
"out"
],
"/nix/store/ddmyhp06jqy8bxj715zwsmbcnzvx8iax-stdenv-linux.drv": [
"out"
]
},
"system": "x86_64-linux",
"builder": "/nix/store/0d3wgx8x6dxdb2cpnq105z23hah07z7l-bash-5.1-p16/bin/bash",
"args": [
"-e",
"/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"
],
"env": {
"buildInputs": "",
"builder": "/nix/store/0d3wgx8x6dxdb2cpnq105z23hah07z7l-bash-5.1-p16/bin/bash",
"configureFlags": "",
"depsBuildBuild": "",
"depsBuildBuildPropagated": "",
"depsBuildTarget": "",
"depsBuildTargetPropagated": "",
"depsHostHost": "",
"depsHostHostPropagated": "",
"depsTargetTarget": "",
"depsTargetTargetPropagated": "",
"doCheck": "",
"doInstallCheck": "",
"installPhase": "PREFIX=$out bash install.sh\n",
"name": "systemd-failmsg-1.3",
"nativeBuildInputs": "/nix/store/qcalxj277ld4jiklmd2lzx6gkcvkc67k-bash-5.1-p16-dev",
"out": "/nix/store/bxj557f73c5k8ayirj30qlh4c4lvwc5w-systemd-failmsg-1.3",
"outputs": "out",
"patches": "",
"pname": "systemd-failmsg",
"propagatedBuildInputs": "",
"propagatedNativeBuildInputs": "",
"src": "/nix/store/ig1jp44jq5vy1lmdkm6ilimhq96v6157-source",
"stdenv": "/nix/store/28hqpbwpzvpff7ldbhxdhzcpdc34lgsa-stdenv-linux",
"strictDeps": "",
"system": "x86_64-linux",
"version": "1.3"
}
}
}
And finally, I can add my package to the machine's configuration.
The question is... how? Again, this would be easier if I could pass stuff from
flake.nix
into child modules. Doing a bit more research, it is indeed
possible, with the specialArgs
attribute, which is a set that is merged with
the usual parameters that are passed to imported modules. I figured out that the
rec
keyword means "recursive" from the NixOS manual, and it allows references
to other attributes within an attribute set, so I can include my flake's
packages as an argument to the other modules:
{
outputs = { self, nixpkgs, sops-nix }@inputs:
let
forAllSystems = f: nixpkgs.lib.genAttrs nixpkgs.lib.systems.flakeExposed (system: f system);
{
packages = forAllSystems (system: import ./pkgs {
pkgs = import nixpkgs { inherit system; };
});
nixosConfigurations."vm" = nixpkgs.lib.nixosSystem rec {
system = "x86_64-linux";
# Merge all flake inputs with the packages
specialArgs = inputs // { mypkgs = packages.system; };
modules = [ ./vm.nix ];
};
};
}
And add it to my system's configuration:
{ mypkgs, ... }:
{
imports = [
/* ... */
];
environment.systemPackages = with mypkgs; [
systemd-failmsg
];
}
Here's the tree of the package:
$ tree /nix/store/...-systemd-failmsg-1.3/
├── bin
│ └── failmsg.sh
├── lib
│ └── systemd
│ └── system
│ ├── failmsg@.service
│ ├── failmsg@.service.d
│ │ └── toplevel-override.conf
│ └── service.d
│ └── toplevel-override.conf
└── share
└── systemd-failmsg
├── always-fails.service
└── doc
├── LICENSE
└── README.md
I can rebuild and run the VM successfully, but the systemd services aren't
available. I do see the script in bin/failmsg.sh
, but nothing else.
Instead of using the provided install script, I'll try to do things the
NixOS way.
My initial alternative approach involved creating a Nix module for
systemd-failmsg
, following the pattern used by other packages/modules that
provide systemd services, e.g.
nginx.
I tried to create the systemd unit file with systemd.service.<service> =
, and
the number of config files for the overrides using environment.etc
. This,
however, produced a number of "Permission denied" errors during building the
env
package. Seemingly the etc/systemd/system
path was already owned by root
at the respective part of the build, was it already locked down and considered
read-only? I didn't have any luck searching for this issue, but I assume this is
an un(der)documented special case for systemd and other core system packages.
I eventually discovered the correct solution while browsing existing examples
and issues. I was first directed to /run/current-system/sw
, where I found my
service under /lib/systemd/system
as I expected. This explains why /lib
doesn't exist - it's instead stashed away in here. It also shows that Nix does
intelligently install the systemd service and the configuration files. The
service did not however show up in systemctl list-units
, suggesting I was
missing another step.
This step was adding the package to the systemd.packages
list, which
according to the
manual
"enables" the service. After doing this, my service and its systemd overrides
show up under /etc/systemd
, and are all picked up correctly.
In the end I was left with this fairly short module:
{ config, lib, mypkgs, ... }:
with lib;
let
cfg = config.services.systemd-failmsg;
in
{
options = {
services.systemd-failmsg.enable = mkEnableOption "systemd-failmsg";
};
config = mkIf cfg.enable {
environment.systemPackages = [ mypkgs.systemd-failmsg ];
systemd.packages = [ mypkgs.systemd-failmsg ];
};
}
And enabling the service instead involved adding the following somewhere in my system's configuration:
services.systemd-failmsg.enable = true;
Conveniently, some Nextcloud services were failing on boot, which gave me a
chance to test the failure message easily. Despite being able to execute the
script manually successfully, when executed by systemd it couldn't find the
failmsg.sh
script. I found the substituteInPlace
command in the manual and
applied a quick fix to the derivation:
postBuild = ''
substituteInPlace failmsg@.service \
--replace /usr/bin/failmsg.sh $out/bin/failmsg.sh
'';
The systemd service still failed, this time because it couldn't find the
commands used, including hostname
, id
, sendmail
. A bit of digging around
issues and the manual showed that I needed to set up the path correctly - makes
sense, considering all the magic Nix does with $PATH
.
I was initially considering using the substitueInPlace
command again to
replace them with the appropriate executable path, e.g.
--replace hostname ${coreutils.id}/bin/hostname
, but that didn't feel right.
Looking at some more example and the manual, I found the writeShellApplication
builder. This is more meant for packages wherein the only output is a single
shell script, but it automagically handles the script's $PATH
for you, so the
script can use the commands like normal instead of absolute paths to the Nix
store.
In the end I saw wrapProgram
in some examples, which as the name suggests,
wraps a program, setting environment variables as desired. I applied this to the
failmsg.sh
script in the derivation, adding the required dependencies to $PATH
:
let
inputs = [ inetutils system-sendmail coreutils ];
in
# ...
nativeBuildInputs = [ makeWrapper ];
buildInputs = [ bash ] ++ inputs;
installPhase = ''
PREFIX=$out bash install.sh
wrapProgram $out/bin/failmsg.sh \
--prefix PATH : lib.makeBinPath inputs
'';
(Question for the future: what differentiates nativeBuildInputs
,
buildInputs
, and the like. Which are build-time only, which are runtime? What
does "native" mean?)
This resolved the issue with unavailable commands, but the service was still
failing due to network unavailability while the service was starting up. The
problem lies upstream, due to the service only containing
After=network.target
, so the service doesn't have a hard requirement on
network availability. I applied a fairly simple fix by adding a Requires
and After
to the systemd unit in the Nix module:
systemd.services."failmsg@".unitConfig = {
After = "network-online.target";
Requires = "network-online.target";
};
(TODO: should I submit this patch upstream?)
Finally, the service is working as expected, and I'm content with its Nix implementation.
Passing nix flake check
I came across the nix flake check
command, which runs tests on your flake. So
I thought it would be a good idea to ensure my flake passes.
$ nix flake check
error: Package ‘systemd-failmsg-1.3’ in /nix/store/...-source/nix/pkgs/systemd-failmsg/default.nix:22 is not supported on ‘x86_64-darwin’, refusing to evaluate.
The first problem I hit was
making my systemd-failmsg
package only available on Linux, otherwise the check
produces the following error:
It was surprisingly difficult to find the appropriate method for "only create this attribute if this condition holds". After a while I eventually remembered coming across the pattern of extending with an attribute set which may be empty depending on the condition. A simple enough function which is probably just implemented like so:
c: set: if c then set else {}
It took a surprising amount of time to come across the appropriate method. In
the end I came across the pattern in the nixpkgs manual, using the function
lib.attrsets.optionalAttrs
.
Underneath it was the handy type specification in the lovely format I know from
Haskell:
optionalAttrs :: Bool -> AttrSet
After discovering the NixOS options search I can't help but wish there were a Hoogle equivalent for Nix/nixpkgs library functions.
Back to the problem at hand, I try using the condition pkgs.stdenv.isLinux
,
but this causes a strange error on nix flake check
:
error: attribute 'busybox' missing
Adding --add-trace
, I eventually realised that at the very start of the trace,
the system
causing the error is actually mipsel-linux
:
… while checking the derivation 'packages.mipsel-linux.systemd-failmsg'
at /nix/store/cl9c0n6wlgga14mcqkv7hypc6djdab00-source/nix/pkgs/default.nix:4:3:
3| // pkgs.lib.optionalAttrs pkgs.stdenv.isLinux {
4| systemd-failmsg = pkgs.callPackage ./systemd-failmsg.nix { };
| ^
5| }
… while checking flake output 'packages'
at /nix/store/3hnh67ga9pw39j495b94h4fwhgqscm37-source/nix/flake.nix:13:7:
12| in rec {
13| packages = forAllSystems (system: import ./pkgs {
| ^
14| pkgs = import nixpkgs { inherit system; };
This system appears to currently be broken, so I'll try specifically filtering
out mipsel-linux
.
{ pkgs }:
{ }
// pkgs.lib.optionalAttrs (pkgs.stdenv.isLinux && pkgs.system != "mipsel-linux") {
systemd-failmsg = pkgs.callPackage ./systemd-failmsg { };
}
And nix flake check
passes!
After coming across a common flake.nix
pattern, I decided to invert this
system filtering, and instead only support a subset of platforms in my flake:
{
outputs = { self, nixpkgs, sops-nix }@inputs:
let
# Specify the list of supported platforms
systems = [ "x86_64-linux" ];
forAllSystems = f: nixpkgs.lib.genAttrs systems (system: f system);
in
rec {
packages = forAllSystems (system: import ./pkgs {
pkgs = import nixpkgs { inherit system; };
});
};
}
Considering only the platforms I care about seems like an easier way to go about
this, rather than trying to support all the platforms Nix does. If I want to
deploy to other platforms in the future I can simply add to systems
.
Overlays: fixing nextcloud-news-updater
Nextcloud's news-updater is a handy
tool for speedily updating RSS feeds. It unfortunately hard-codes running the
occ
command under the Nextcloud installation directory. However, NixOS has its
own nextcloud-occ
script which conveniently wraps the normal command with
sudo
and the normal Nix stuff.
I generally followed examples on Flake wiki page and managed to get it patched via an overlay.
Worth noting I spent a long time trying to debug my first attempt, as nix flake check
would result in "error: infinite recursion encountered" in
fixed-point.nix. It took me a while to find the issue was specifying the
arguments as a set { self, super }:
instead of two separate arguments self: super:
. And this is why types and type checking is useful. That, and useful
errors.
Packaging Firefox Syncserver
This was easily the most painful part of the conversion process.
At the time of writing, there's no official package for this Python app in nixpkgs, because it's stuck on Python 2 and its dependencies have been broken. The new implementation syncstorage-rs was recently packaged, however,I wanted to stick with the old version I was already using. Also, the new app only supports MySQL and I didn't want to run yet another database engine on top of PostgreSQL, when a SQLite database would do just fine.
Unfortunately there's no "official" tool for easily creating a Nix derivation for a Python packages, with all its dependencies nicely bundled up.
I took a look at a few tools to do just that, with mixed success.
Ideally what I want is a tool that I provide a package name (and maybe version), which then pulls down all its dependencies and generates Nix derivations for them as necessary, effectively pinning them in the Nix fashion. I wouldn't mind if this it autogenerated code for me, I just don't want to write it all myself.
pypi2nix
Searching through older Nix guides and discussion, pypi2nix was brought up as an easy way to import existing Python packages. However, the project's repository now is marked as archived, and it was indeed marked as broken in nixpkgs. I gave up on it quite quickly, looking for other suggested alternatives.
mach-nix
From the surface, it seems like a sensible system: you supply a list of requirements, just as in your average Python project, and dependency resolution is managed for you.
However, I quickly ran into trouble with Python 2 support, with many packages showing "not supported for interpreter python2.7" errors. mach-nix isn't able to handle evaluation errors due to packages marked as broken, which appears to be more of a problem with the Nix language. This apparently makes it hard for mach-nix to do its dependency resolution without running into evaluation errors for Python 2 packages.
I tried manually editing mach-nix to use tryEval
in a few more places, but I
couldn't reliably fix the problem. Local testing in the nix repl
seemed to
reliably prevent errors when evaluating derivations directly, but it failed to
catch evaluation errors for a derivation's dependencies. It also wasn't easy to
get a list of all of a derivations dependencies (see an attempt
here), which made trying to fix this
even more tedious.
At this point I also realised the underlying method mach-nix uses to make things "pure" as Nix requires: fetching a Git repository which indexes everything available in PyPi and other Python repositories. This index currently sits at over 400MB, which I'm sure will only grow with time. Personally, I felt this is a sledgehammer approach. I appreciate Nix already isn't particularly conservative when it comes to disk usage, but this is extreme.
poetry2nix
I initially thought poetry2nix was only for packages already using Poetry, so I overlooked it in favour of trying out other Python-to-Nix tools. Reading a bit more into it, I realised I could repackage syncserver myself.
I add python3Packages.poetry
to my dev shell, and go through the poetry init
process, requiring python 2.7
. But when trying to poetry add
some
dependencies, it complains about the current Python version not being 2.7
. It's
a bit confusing that a build tool needs to run on the version the end
application uses, but I suppose it may need to evaluate the setup.py
script.
This initially wasn't easy, poetry2nix doesn't officially support Python
2 because poetry
is
marked as broken for Python 2 in nixpkgs, due to
After running poetry env use python2.7
, it complains about my current version
2.7.xx
not matching 2.7
. Of course, I need to make the version requirement
fuzzy by instead specifying ^2.7
. After the adjustment, poetry env use python2.7
works.
poetry add
got a bit further now, but started complaining about the explicit
version requirements conflicting with transitive dependencies. To my horror, it
turns out pip
doesn't do proper dependency resolution, apparently it simply
accepts the first version of a package that is requested. Naturally, Poetry
doesn't provide an escape-hatch for this
situation, making it a pain
to port packages relying on this broken pip
behaviour to Poetry.
My solution was to provide to include and patch a nested copy of the broken
dependency. This could probably be replaced with the usual Nix source patching
mechanism along with a poetry2nix override.
Before working out how poetry2nix overrides worked, I tried manually adjusting
the poetry.lock
file to change the source of another broken dependency,
umemcache, which has unreleased fixes.
Annoyingly, I hit a poetry bug which interfered with
poetry2nix, where
dependency metadata wasn't being specified, which is what poetry2nix relies on
to make all the dependency fetching pure. I fixed this with a nix flake update
to nixpkgs. Unfortunately this introduced more issues, including an infinite
recursion error, and a
large number of packages requiring setuptools
to be added to their
dependencies. I managed to work around both with a tedious number of poetry2nix
overrides.
Another limitation I hit was Poetry not locking URL
dependencies, causing
confusing errors about hashes being required when building with poetry2nix. The
fix is to override the src
of the offending packages, repeating the URL and
providing the correct hash.
After this, I finally had a build-able package, I then used the dependencyEnv
attribute to get a store path containing all the dependencies, including
Gunicorn which I needed my service to execute. I hit a collision between some
files, so I had to set ignoreCollisions = true;
:
Collision between backports-functools-lru-cache and python2.7-configparser: python2.7/site-packages/backports/__init__.pyc
Configuring the service
From there, I amended my NixOS module for firefox-syncserver until it was up and running. Some fun problems:
Nix: confusing errors when generating the config file with format.generate
and
recursiveUpdate
. I've forgotten the details of what caused this.
Gunicorn: fun trial and error with the systemd service hardening, additionally
needed the @chown
and @setuid
permissions, for changing the owner of a file
in /tmp
, and setting the user for worker processes respectively.
Gunicorn: having no logging by default! And making it tedious to configure logging.
Syncserver: forgetting the https://
in the public_url
config option causes
the annoying mismatch between public_url
and the origin URL received by
Gunicorn. The error message also unhelpfully duplicates the received URL,
causing me to confuse the issue for a bug in Nginx or Gunicorn itself.
My website
I use a fairly simple deployment method for my website: $ git push
with a
post-receive
hook on
the receiving end, which rebuilds the website using
Zola (and Graphviz for some
diagrams).
This introduces a problem to the Nix system configuration for the webserver: cloning a Git repository and updating it separately is an impure operation. It doesn't make much sense to wrap the website in a Nix package if I want to be able to painlessly update the website separately to the system it runs on. However, I would like the NixOS configuration to include the initial setup of the repository (cloning it, and setting up that post-receive hook).
While searching for potential methods of implement this, I came across
system.activationScripts
in other people's configurations. This seems to fit the bill - it allows you to
specify arbitrary scripts that run on boot and nixos-rebuild
. I created a
script that clones the repository to an appropriate location, and installs a
wrapped version of the post-receive hook.
Deploying
I used nixos-infect, as a cloud-init script when allocating a VM on Hetzner (with Ubuntu as the base OS). It works perfectly, and after a reboot the new VM claims it is NixOS:
$ cat /etc/os-release
BUG_REPORT_URL="https://github.com/NixOS/nixpkgs/issues"
BUILD_ID="22.05.3475.ed9b904c5eb"
DOCUMENTATION_URL="https://nixos.org/learn.html"
HOME_URL="https://nixos.org/"
ID=nixos
LOGO="nix-snowflake"
NAME=NixOS
PRETTY_NAME="NixOS 22.05 (Quokka)"
SUPPORT_URL="https://nixos.org/community.html"
VERSION="22.05 (Quokka)"
VERSION_CODENAME=quokka
VERSION_ID="22.05"
Then I attempted to deploy my system configuration with nixos-rebuild
:
$ nixos-rebuild nixos-rebuild switch --target-host atlas2 --flake .#webserver
error:
Failed assertions:
- The ‘fileSystems’ option does not specify your root file system.
- You must set the option ‘boot.loader.grub.devices’ or 'boot.loader.grub.mirroredBoots' to make the system bootable.
Probably missing the /etc/nixos/hardware-configuration.nix
. Copied it from the
VM and included it in the system config.
Another error:
error:
Failed assertions:
- You must set the option ‘boot.loader.grub.devices’ or 'boot.loader.grub.mirroredBoots' to make the system bootable.
Weird that nixos-infect doesn't sort out the bootloader for you. I manually set
to boot.loader.grub.device = '/dev/sda';
and that sorted that.
The next deploy worked, but there were a couple of errors, because I forgot to update the sops keys.
I also messed up and set PermitRootLogin no
in the SSH config without setting
up a user with sudo
access, so I rolled back over the existing SSH connection:
$ nixos-rebuild switch --rollback
Glad to see that switching works without rebooting! It even disables & stops services, deletes groups, and all that stuff.
While sorting this out, it was annoying having to swap my local SSH config back-and-forth between root and a normal user.
Found the --use-remote-sudo
option for nixos-rebuild
, but this seems to
require passwordless sudo. I set that up with the following options:
users.users.<user>.extraGroups = [ "wheel" ];
security.sudo.wheelNeedsPassword = false;
Re-did the deploy with --use-remote-sudo
and it had the same result, so I'm
taking it as done.
Some strange errors on the next rebuild:
error: cannot add path '/nix/store/1rhrxs9n736a4f3gqlqmi211xnhi10ka-nextcloud-config.php' because it lacks a valid signature
From this issue, it appears to be
caused by nixos-rebuild
copying things from my local store to the server's
store - by default copying is apparently only allowed from cache.nixos.org
.
At this point I messed up the SSH and sudo permissions and was locked out of
root. I also didn't have a root password, so I went through Hetzner rescue. The
lack of a normal $PATH
made it somewhat tedious to run the commands needed,
including bash
and passwd
. I had to search the store for them.
The "valid signature" issue turned out to be due to swapping to the normal user
with --use-remote-sudo
. Apparently one needs to tweak
nix.settings.trusted-users
to include any users that are able to nixos-rebuild
. To resolve this, I added
@wheel
to this setting, and ran nixos-rebuild
from the machine itself to
work around the issue. I added this to the wiki
page to highlight the problem
before it occurs.
Performing the migration
At this point I had a fully functional NixOS system, reproducing just about
everything I needed from the old server! As a temporary measure, I replicated my
normal website DNS entries but adding a prefix: new.williamvds.me
- this was
quite easy since I'd made the domain name a Nix option.
I followed some standard data migration procedures, copying over my Nextcloud data and PostgreSQL database. Things generally "just worked", though I noticed a few odd things that needed minor adjustments to options.
Some testing ensured, I made sure all the new services worked by logging into them on my devices. After some more hassle with firefox-syncserver, I was fairly confident all was in order, so I migrated across by editing the normal DNS entries to point to the new server's IP address.