A Journey into Nix and NixOS

21 October 2022

I chronicle my introduction to and experiments with using a rather radical GNU/Linux operating system on my webserver

Before I started this journey, I ran this website and a few personal services on the cheapest VPS OVH offer, but the limited disk space has frequently given be trouble. 20GB isn't a lot to work with when you're using it to synchronise and stash personal files. I planned to take cancel its upcoming renewal, and instead migrate to another VPS provider that provides a bit better value for money. However, this introduces the problem of manually migrating and setting things up again. Naturally, having done most of the setup over a year ago and I'd forgotten most of it. I wasn't keen on repeating that mistake: I wanted a proper automated deployment method. One that doesn't rely on me doing everything manually, or rely on all the knowledge staying in my head.

While Ansible and similar are the "industry-standard" tools for doing this. I've seen Ansible used at a previous job, and witnessed how slow it can be, and now messy it can get with its piles of YAML. Needless to say, I wasn't left with the best impressions, and wasn't too keen on it for personal use. I'd came across Nix online through articles and discussions, including Xe Iaso's blogs and talks, and the Self Hosted podcast. Its approach sounded novel and interested me, so I took the server migration as an opportunity to experiment with it.

I wrote most of this article as I went along, and it chronicles most of my trial-and-error in creating a NixOS. It's mostly a stream of conciousness, serving primarily as a record of what I struggled with and thoughts that I had along the way. In a subsequent article, I'll write up my perspective on Nix and NixOS and what I'd like to see from it in the future. That one will hopefully be less ramble-y and more interesting to the average reader.

It's been a long road, I've been working on it on and off for months now, starting in May 2022 and continuing until just before this article was published. Thankfully I consider this experiment a success, my data and services have all been migrated to the new server before my old VPS was up for renewal renewal, as I'd planned. It's been quite messy and frustrated at times, but I'm quite happy with how things turned out. I think all this work is a "once and for all" effort to ensure I don't have to manually set up my server again. I'm also reasonably happy with Nix, and I think I'll be using it more in the future.

If you're interested in viewing the end result (i.e. my NixOS configuration), you can find it in my configuration repository.

Preparatory reading

As with most things, I initially started with some web searches, and came across How to Learn Nix, which I ended up reading through quite a lot of.

I additional came across Nix Pills, which I think I briefly skimmed, and the Nix Wiki, which I initially referred to frequently while getting started.

Installing on my Arch computer

Fairly straightforward, it's available in Arch's community repository, so I installed it through pacman. After hitting some errors, I followed the wiki's guide and added the nixpkgs-unstable channel.

Then looked at managing user config files. I found home-manager but sounded like way more than I need - I want to keep my dotfiles for portability when I can't use Nix, I just need a little Nix expression for installing the few config files that need to live outside of ~/.config as symlinks.

A user Nix config for installing applications would be neat, but I decided to postpone until I'm further along. I decided to make my first goal reproducing my webserver configuration with Nix. So on to setting up a playground environment before investing in another webserver.

Raspberry Pi

I have a Raspberry Pi free for some experimentation, so I thought I'd use it as a testbed in place of a real VPS.

The wiki article suggests downloading and writing the SD card, but it's a graphical installer. I don't have a micro HDMI for the Pi so I want a headless installation with SSH (which is what I used with Arch on ARM). I assume I need to build my own image as suggested in the wiki, including my ssh key. Initially Followed the wiki article, using cross-compilation.

{ ... }: {
  nixpkgs.crossSystem.system = "aarch64-linux";
  imports = [
    <nixpkgs/nixos/modules/installer/sd-card/sd-image-aarch64.nix>
  ];
  users.extraUsers.root.openssh.authorizedKeys.keys = [
    "ssh-rsa ...."
  ];
}
$ nix-build --system aarch64-linux --keep-failed --store /data/nix-raspberrypi '<nixpkgs/nixos>' -A config.system.build.sdImage -I nixos-config="$HOME"/.config/nix/raspberrypi-sdcard.nix -I nixpkgs=channel:nixos-21.11-aarch64

What does the command mean? Strangely it wanted to rebuild a lot of applications. Why isn't it downloading everything from the cache? First roadblock was an error in liburing, but hmm this appears to be building fine on hydra. I find the --keep-failed option to actually inspect what's wrong. Some the post install script referred to a nonexistent file. Tried adding the nixos-21.11 and nixpkgs-21.11-aarch64 channels - no change. Found the --dry-run option - why are the hashes on hydra different to my nix-build? (Was it because I didn't pass --pure?). I followed some of my previous reading material and managed to override the liburing derivation using overrideAttrs:

{ ... }: {
  nixpkgs.crossSystem.system = "aarch64-linux";
  imports = [
    <nixpkgs/nixos/modules/installer/sd-card/sd-image-aarch64.nix>
  ];
  nixpkgs.overlays = [
    /*
    (self: super:
      {
        liburing = super.liburing.overrideAttrs (attrs: {
          postInstall = ''
            # Copy the examples into $bin. Most reverse dependency of this package should
            # reference only the $out output
            mkdir -p $bin/bin
            cp ./examples/io_uring-cp examples/io_uring-test $bin/bin
            cp ./examples/link-cp $bin/bin/io_uring-link-cp
          '';
        });
      })
    */
  ];
  users.extraUsers.root.openssh.authorizedKeys.keys = [
    "ssh-rsa ...."
  ];
}

Also try the undocumented --store to override the store because my root drive is running out of space. This causes issues later, so I clear up some space and use the default /nix/store instead

More building...

Why am I building the Linux kernel? And the AMD drivers and Nouveau? I don't need those. I hit "no space on device" errors, my tmpfs is running out of space building all of this stuff. Find some GitHub issues that mention remounting /tmp and resizing to 14GB. Fine, I do that. (Later someone mentions that it actually uses $TMPFS rather than hardcoding /tmp, so I could have used a different filesystem.)

Find the nix-store --read-log option, which is quite neat. Before finding it, I ran builds several times over after losing the tmux scrollback buffer (and later increasing the tmux scrollback limit).

More building...

Make: error 2 during linking(?) At this point I was lurking in the NixOS on ARM channel, someone mentioned to someone else that cross compilation isn't reliable. Suggests the qemu-user method. Fine, time to try that.

I install qemu-user and the binfmt packages from the AUR, add this to /etc/nix/nix.conf:

extra-sandbox-paths = /usr/bin/qemu-aarch64-static
extra-platforms = aarch64-linux

Still get some errors. I try overriding nixpkgs.system and system, no dice. Apparently --argstr system aarch64-linux isn't working right now. Find another undocumented option, --system, which does appear to work.

Later I find out (from the chat?) the distributed SD card image does support SSH, you just need to create /root/.ssh/authorized_keys on the flashed system. So that whole exercise was a waste of time.

I flash the SD card, plug Pi and Ethernet in, no green light, but lights on Ethernet. Pi debugging ensues, replugging the SD card before Ethernet eventually works. Pi shows up on the router's web UI, but isn't assigned an IP address. DHCPD issue? I have no idea, and I'm not sure how to debug. I give up, maybe I should try a VM, which should be simpler, right?

Building a QEMU image

I find a couple of guides for making VMs with Nix:

I also find a link to the VM Nix module, which as all the available options. I ideally want no graphical window, just a shell.

Why isn't nix-build -A vm '<nixpkgs/nixos>' --arg configuration "{ imports = [ <nixpkgs/nixos/modules/virtualisation/build-vm.nix> ]; }" -I nixos-config=vm.nix working? Works if I put everything in vm.nix and add the build-vm.nix import. Probably the configuration argument being ignored.

VM template:

{ pkgs, ... }:

{
  imports = [
    <nixpkgs/nixos/modules/profiles/qemu-guest.nix>
    <nixpkgs/nixos/modules/virtualisation/qemu-vm.nix>
  ];

  config = {
    system.stateVersion = "22.05";

    fileSystems."/" = {
      device = "/dev/disk/by-label/nixos";
      fsType = "ext4";
      autoResize = true;
    };

    boot = {
        growPartition = true;
        kernelParams = [ "console=ttyS0" ];
        loader.grub.device = "/dev/vda";
        loader.timeout = 0;
    };

    users.extraUsers.root.password = "";

    services.getty.autologinUser = "root";

    networking.hostName = "vm";

    virtualisation = {
      cores = 4;
      graphics = false;
      qemu.options = [ "-serial mon:stdio" ];
      # Forward any needed ports
      forwardPorts = [
        { from = "host"; host.port = 8080; guest.port = 80; }
      ];
    };

    nix.extraOptions = "
      extra-experimental-features = nix-command flakes
    ";
  };
}

Runs successfully with:

$ nix-build '<nixpkgs/nixos>' -A vm --arg configuration ./vm.nix
$ ./result/bin/run-vm-vm

I'm not sure how to break down the nix-build call. Include <nixpkgs/nixos and vm.nix, evaluate the expression vm? What is the vm attribute? Where does it come from?

Writing a Nix module for my webserver

Nextcloud

I first find the wiki page, but it's pretty barren. It doesn't even link to manual page. I follow the next search result and find the manual section.

Setup is mostly straight-forward, most of the time spent trying to create these secret files on the disk. Required a bit more exploration of the Nix language.

I initially think I could manage them by overriding options/attribtues when calling nix-build. However the nix-build switch --option appears to refer to Nix options, i.e. things that go in /etc/nix/nix.conf --arg and --argstr appear to be function arguments, i.e. things specified in { arg }: at the start of a module

Is there a point in exploring this approach, if it'll be in plaintext in the store anyway? In the next section, I look into NixOps and if/how it handles secrets sensibly.

Selecting Nextcloud apps is a bit janky, I was expecting it to just need the app name, but it wants you to manually specify the download URL, hash, and all that jazz manually. I assume its for reproducibility, and no one's created packages for Nextcloud apps yet.

Dealing with secrets

Sorting this out probably going to be unavoidable if I want to use Nix "properly", this is, running one command to completely reproduce a system with virtually zero manual work required.

I find the wiki article on the subject

Lots of options - none of them ideal and zero extra effort. Proper secret management within Nix and the Nix store has been on the back-burner for a long time: https://github.com/NixOS/nix/issues/8.

My ideal: hook into pass, my password manager, to retrieve my secrets. Let me specify the required ones in my configuration, and deploy them when I'm running NixOps or whatever, with appropriate (and configurable) file permissions. Let them be persistent so I don't have to redeploy them every time, and clean up older ones if I remove them from the config.

NixOps

Okay, the default NixOps key system looks fine. You can override where keys are saved so I can create them in /secrets or something. I'm thinking to use NixOps to deploy to real servers anyway, so I'm gonna have to learn it at some point. Looks like it has some features for delaying/restarting systemd units when secrets change or are made available, which is handy.

What is up with the NixOps manual on nixos.org? It has very little info, is lacking context on how to write NixOps configuration files and load commands.

Proper manual appears to be on releases.nixos.org

Can I override the SSH identity file used or do I really need to adjust my ~/.ssh/config?

I type in the host name as I would if I were running ssh <host>, but for some reason not all SSH options are picked up. E.g. I need to set deployment.targetPort, otherwise NixOps tries to connect to port 22.

$ nixops deloy is building stuff - is it using the version of nixpkgs I'm running NixOps from? Do I need to create another deployment to reload the NixOps network file?

No, it looks okay in the end. But hits some errors at the end of the deploy, I assume running post-update GRUB hooks?

vm> updating GRUB 2 menu...
vm> installing the GRUB 2 boot loader on /dev/vda...
vm> Installing for i386-pc platform.
vm> /nix/store/4bmcgmin11ixrn2ij2jjinlddcxql1g6-grub-2.06/sbin/grub-install: warning: File system `ext2' doesn't support embedding.
vm> /nix/store/4bmcgmin11ixrn2ij2jjinlddcxql1g6-grub-2.06/sbin/grub-install: warning: Embedding is not possible.  GRUB can only be installed in this setup by using blocklists.  However, blocklists are UNRELIABLE and their use is discouraged..
vm> /nix/store/4bmcgmin11ixrn2ij2jjinlddcxql1g6-grub-2.06/sbin/grub-install: error: will not proceed with blocklists.
vm> /nix/store/lc7cvzrgsxks87g3jnjbg8pg2pqn15vh-install-grub.pl: installation of GRUB on /dev/vda failed: No such file or directory
vm> error: Traceback (most recent call last):
  File "/nix/store/4zh7crx1r2sizvpyb1c9h109yfimlzn2-python3-3.9.12-env/lib/python3.9/site-packages/nixops/deployment.py", line 893, in worker
    raise Exception(
Exception: unable to activate new configuration (exit code 1)

Why is it installing for i386? My VM is x86_64. Why is detecting the filesystem as ext2? /dev/vda is ext4.

I assume there are errors from updating within the VM itself, and probably not NixOps' fault. It just happened to update some packages.

Further research: grub-install is apparently trying to update an MBR partition with UEFI stuff? Try to force the VM to use UEFI with GRUB by inspecting the qemu module.

I find a related option, and try setting virtualisation.useEFIBoot = true: get the following error:

qemu-kvm: -drive if=pflash,format=raw,unit=1,file=: A block device must be specified for "file"

Further inspection: there is a dependency between useEFIBoot and useBootLoader (the latter defaults to false)

I take a brief detour to enable KVM in my BIOS.

With the new changes, the VM just boots into the UEFI shell, instead of directly to the system.

Back to the normal VM. It's quicker now due to the KVM modules loaded, but of course back to the same NixOps error.

Maybe I should create VMs through NixOps instead, which will be better supported? How about a quick detour and using sops-nix instead?

(In hindsight, this detour was not quick at all)

sops-nix and Flakes

Again, not ideal because I'd prefer to just use passwords stored in pass. But I understand why sops is preferable when it comes to Nix - the encrypted secrets can be used as an input to the system configuration. For now it seems like one of the easier options to go for. I like that it apparently hooks into nixos-rebuild and the like.

Quick question, how do I use nixos-rebuild when I'm not on NixOS? Turns out it's packaged in nixpkgs, duh. I run it with $ nix-shell -p nixos-rebuild

sops-nix requires the target host's SSH key which is rather annoying, since it means having to deploy after creating a machine. And some manual work involved in extracting the SSH public key of new machines.

Setup guide is rather convoluted and complex, due to multiple encryption methods: using age keys, converting GPG or ssh keys to age keys.

Try to copy sops-nix example, so I guess I'm trying out flakes now:

{
  inputs = {
    nixpkgs.url = "nixpkgs/nixpkgs-unstable";
    sops-nix.url = "github:Mic92/sops-nix";
    sops-nix.inputs.nixpkgs.follows = "nixpkgs"; # use the same nixpkgs from this flake's input, this seems hacky
  };

  outputs = { self, nixpkgs, sops-nix }: {
    nixosConfigurations."vm" = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [
        ./vm.nix
        sops-nix.nixosModules.sops
      ];
    };
  };
}

Try to deploy with $ nixos-rebuild switch --target-host vm --flake '.#vm'

error: getting status of '/nix/store/ppm1c1rch1p6w260kpfxma6kww8wkfdf-source/nix/flake.nix': No such file or directory

Okay, turns out I need to git add flake.nix for some reason.

error: cannot look up '<nixpkgs/nixos/modules/profiles/qemu-guest.nix>' in pure evaluation mode (use '--impure' to override)

I guess I need to explicitly add the nixpkgs flake? Do flakes change the way imports work? How to I translate a <nixpkgs/...> import into an expression?

Now I try to rebuild my VM, but now I'm getting a new error:

error: The option `sops' does not exist. Definition values:
       - In `/home/william/.config/nix/vm.nix':
           {
             defaultSopsFile = /home/william/.config/nix/secrets/common.yaml;
             secrets = {
               "nextcloud/admin" = { };
               "nextcloud/database" = { };

Durr, this is because I'm still running nix-build but I'm using the sops option in my configuration. I assume I need to convert this to the new nix build command to make use of Flakes? Let's give this a shot.

The VM module creates system.build.vm, which is probably what I'm currently invoking with nix-build. The nix build --help has the following example:

Build a NixOS system configuration from a flake, and make a profile point to the result:

# nix build --profile /nix/var/nix/profiles/system \
  ~/my-configurations#nixosConfigurations.machine.config.system.build.toplevel

That looks pretty similar! Surely I can change ~/my-configurations and machine appropriately, replace system.build.toplevel with system.build.vm, and that's my new build command?

$ nix build .#nixosConfigurations.vm.config.system.build.vm
warning: Git tree '/home/william/.config' is dirty
error: cannot look up '<nixpkgs/nixos/modules/profiles/qemu-guest.nix>' in pure evaluation mode (use '--impure' to override)

       at /nix/store/h7ncs2xdj5nmfwfrgnhxv4wr0q6v6fqd-source/nix/vm.nix:8:5:

            7|   imports = [
            8|     <nixpkgs/nixos/modules/profiles/qemu-guest.nix>
             |     ^
            9|     <nixpkgs/nixos/modules/virtualisation/qemu-vm.nix>
(use '--show-trace' to show detailed location information)

Progress? Now, how to import these nixpkgs modules... they're not exposed through the nixpkgs flake, are they?

The Flakes wiki page has a Making your evaluations pure section, this suggests I need to somehow download these nixpkgs modules that I'm trying to import? Can I import a module within a flake's repository directly? Surely the entire repository is already in the nix store anyway?

Yes!

I turns out that I've had a incorrect presumption. I don't grasp the reason but even in the case of being a flake input inputs.home-manager will still return the store path of the home-manager flake. Which means I can still use it as an ordinary non-flake input That's Great A comment by ykis-0-0 on reddit.com/r/NixOS

This hint led me to add a trace to within the outputs function in flake.nix, and lo and behold, it just points to the store path for nixpkgs!

{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";

    sops-nix.url = "github:Mic92/sops-nix";
    sops-nix.inputs.nixpkgs.follows = "nixpkgs"; # nasty hack to use the same nixpkgs from this flake's input
  };

  outputs = { self, nixpkgs, sops-nix }: {
    nixosConfigurations."vm" = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [
        (nixpkgs + "/nixos/modules/profiles/qemu-guest.nix")
        (nixpkgs + "/nixos/modules/virtualisation/qemu-vm.nix")
        ./vm.nix
        sops-nix.nixosModules.sops
      ];
    };
  };
}

Not ideal, since now the import has moved outside of the vm.nix module which actually needs these imports.

Question: Can I somehow move this import to inside the module?
(I found out the answer much later: Yes! Using specialArgs)

Getting another error now:

$ nix build .#nixosConfigurations.vm.config.system.build.vm
warning: Git tree '/home/william/.config' is dirty
error: getting status of '/nix/store/2awpx4cw6r4y4zxd3rnx93xq4h2mb9l3-source/nix/common.nix': No such file or directory
(use '--show-trace' to show detailed location information)

If I explore this store, common.nix is indeed missing, but most of the files are there! Also, this store contains a snapshot of my entire Git repository, most of which isn't needed by the Nix stuff. But that's suspicious - is it because common.nix isn't staged yet? Why indeed, it isn't. Is this what the warning is about? Maybe that warning should also specify that only staged files will be used during the build, and especially check special files like flake.nix, it's confusing for builds to immediately fail with an error that doesn't make sense unless you know that it's copying Git-tracked files to the Nix store. I have everything ignored by default in my Git repository (because it lives in ~/.config, which many applications dump files into), so I hit this problem repeatedly.

Finally, success! I've now converted to flakes, time to test out sops-nix again.

I can see sops-nix trying to do something at the start, but since I've wiped my VM its ssh-key has changed:

setting up secrets...
sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key with fingerprint cb3e5e4ce4278da9a1bb7beb7ff790665581e7ad
/nix/store/6a7866mckid9n3s8bvanp5ag5fk2awyl-sops-install-secrets-0.0.1/bin/sops-install-secrets: Failed to decrypt '/nix/store/p2inyam4wdxgljisqx035y1i8f85npi5-common.yaml': Error getting data key: 0 successful groups required, got 0
Activation script snippet 'setupSecrets' failed (1)

Time to redo part of the sops-nix setup. I get why it's done this way - using the host's SSH key is a neat idea - but I don't like that it's a manual task that needs to be done after initially booting a system. Maybe it's more of a problem with these throwaway VMs that I'm using.

Updated the age key for the VM, then ran sops updatekeys <file>. New error on boot:

setting up secrets...
sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key with fingerprint cb3e5e4ce4278da9a1bb7beb7ff790665581e7ad
/nix/store/6a7866mckid9n3s8bvanp5ag5fk2awyl-sops-install-secrets-0.0.1/bin/sops-install-secrets: Failed to decrypt '/nix/store/zl4yjsw14if8wqrblnf1lk50y01zvl6w-common.yaml': Error getting data key: 0 successful groups required, got 0
Activation script snippet 'setupSecrets' failed (1)

Is sops-nix trying to use the host's RSA key? I thought the age method only supported the ed25519 key? My mistake: the sops-nix guide says to set the following:

sops.age.sshKeyPaths = [ "/etc/ssh/ssh_host_ed25519_key" ];

Also, I was accidentally using my actual host's SSH keys because I ran $ ssh-keyscan localhost -p 2222 instead of $ ssh-keyscan -p 2222 localhost. That one's on ssh-keyscan.
On top of that, ssh-keyscan doesn't detect the ed25519 keys on the VM for some reason, so I had to copy the public key manually and pipe it through to ssh-to-age. One last $ age updatekeys ..., and now there are secrets!

setting up secrets...
sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key with fingerprint cb3e5e4ce4278da9a1bb7beb7ff790665581e7ad

...

$ ll /run/secrets/
total 4
drwxr-x--x 2 root keys  0 May 29 12:59 nextcloud
-r-------- 1 root root 30 May 29 12:59 smtp

Went back and amended Nextcloud configuration to use sops-nix instead of Nix options.

Finally, time to get back to the task at hand.

Quick diversion: setting up a Nix formatter

Initially running nix fmt gives a somewhat helpful (but not exactly user-friendly) error:

warning: Git tree '/home/william/.config' is dirty
error: flake 'git+file:///home/william/.config?dir=nix' does not provide attribute 'formatter.x86_64-linux'

It should probably indicate to check nix fmt --help, because that page has a few examples to copy-paste from. I chose nixpkgs-fmt and added this to the output set in my flake.nix as directed:

{
  output = {
    formatter.x86_64-linux = nixpkgs.legacyPackages.x86_64-linux.nixpkgs-fmt;
  };
}

Making a package: systemd-failmsg

I use systemd-failmsg to get emails when a service fails on my webserver. I definitely want it on my new server, but it's as-yet unpackaged. So I guess I need to create a proper Nix package for it?

Let's see this wiki page It basically says "look for something similar in nixpkgs and copy that". Which isn't ideal. Some basic templates would be nice.

Okay, I've copied something that looks reasonable:

{ stdenv, lib, fetchFromGithub, ... }:

stdenv.mkDerivation rec {
  pname = "systemd-failmsg";
  version = "1.3";

  src = fetchFromGithub {
    owner = "dino-";
    repo = pname;
    rev = "v${version}";
    sha256 = "7a2a6cc9311f1370b1c295d3cf0428604e03ced08bb894e073cd58209f5ef537";
  };

  meta = with lib; {
    homepage = "https://github.com/dino-/systemd-failmsg";
    description = "systemd toplevel override which sends emails alerts when systemd services fail";
    license = licenses.isc;
    platforms = platforms.linux;
  };
}

How to I build it? $ nix build --file ... gives me:

error: anonymous function at /home/william/.config/nix/pkgs/systemd-failmessage.nix:1:1 called without required argument 'stdenv'

The wiki article doesn't go into detail on using $ nix build to build a package. Most of the existing documentation uses nix-build instead, which obviously doesn't make use of flakes. Okay, a different approach: specifying the package in flake.nix. The wiki page has very little detail on how to use output.packages other than specifying that it's used by $ nix build .#<package>. nixpkgs only uses legacyPackages, so I have to search elsewhere. I looked at the Flake RFC, in which a comment mentions the callPackage pattern used by nixpkgs for its legacy packages, but within the output function of my flake nixpkgs just refers to the store path. It took me a bunch of searching until I found this blog which mentions actually importing nixpkgs to provide the classic pkgs input. Makes sense once it's done in an example!

{
  # ...
  outputs = { self, nixpkgs, sops-nix }:
    let
      forAllSystems = f: nixpkgs.lib.genAttrs nixpkgs.lib.systems.flakeExposed (system: f system);
    in
    {
      packages = forAllSystems (system: import ./pkgs {
        pkgs = import nixpkgs { inherit system; };
      } );

      nixosConfigurations."vm" = nixpkgs.lib.nixosSystem {
        system = "x86_64-linux";
        modules = [
          (nixpkgs + "/nixos/modules/profiles/qemu-guest.nix")
          (nixpkgs + "/nixos/modules/virtualisation/qemu-vm.nix")
          ./vm.nix
          ./common.nix
          sops-nix.nixosModules.sops
        ];
      };

      formatter.x86_64-linux = nixpkgs.legacyPackages.x86_64-linux.nixpkgs-fmt;
    };
}

I also copied the forAllSystems function used by nixpkgs' flake.nix so my packages are cross-platform by default.

Tried $ nix build .#systemd-failmsg, Encountered a very helpful error message!

error: You meant fetchFromGitHub, with a capital H

Given how specific this error is ("with a capital H"), this is probably hard-coded message for this specific misspelling of a function. However, it's an example of what error messages should look like in practice.

The most related section of the NixOS manual appears to be Adding custom packages, which dutifully links to the more details nixpkgs manual which actually covers packaging in detail. My skim-reading caused me to skip over this link - I instead searched for mkDerviation in the NixOS manual but found very little. Nevertheless, I was able to piece the puzzle together from the NixOS manual alone, working out my roadblocks:

However, after these issues I find everything as expected under result/. I'm pleasantly surprised that Nix automatically substitutes the shebang in a script which is part of the package - #!/bin/bash is replaced with #!/nix/store/.../bash. Excellent, that spares me from some tedious patching.

I can now search for my package and even show its derivation, nifty!

$ nix search . systemd-failmsg
* packages.x86_64-linux.systemd-failmsg (1.3)
  systemd toplevel override which sends emails alerts when systemd services fail

$ nix show-derviation .#systemd-failmsg
{
  "/nix/store/l14p85r1cn2fix4sisr6c1cfmgzm4x3p-systemd-failmsg-1.3.drv": {
    "outputs": {
      "out": {
        "path": "/nix/store/bxj557f73c5k8ayirj30qlh4c4lvwc5w-systemd-failmsg-1.3"
      }
    },
    "inputSrcs": [
      "/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"
    ],
    "inputDrvs": {
      "/nix/store/3i6jqp61ra3031kjs1jlrmmqd2jyixd6-source.drv": [
        "out"
      ],
      "/nix/store/42pr7zqjf0y29v19q1wxn6hs5gdl5car-bash-5.1-p16.drv": [
        "dev",
        "out"
      ],
      "/nix/store/ddmyhp06jqy8bxj715zwsmbcnzvx8iax-stdenv-linux.drv": [
        "out"
      ]
    },
    "system": "x86_64-linux",
    "builder": "/nix/store/0d3wgx8x6dxdb2cpnq105z23hah07z7l-bash-5.1-p16/bin/bash",
    "args": [
      "-e",
      "/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"
    ],
    "env": {
      "buildInputs": "",
      "builder": "/nix/store/0d3wgx8x6dxdb2cpnq105z23hah07z7l-bash-5.1-p16/bin/bash",
      "configureFlags": "",
      "depsBuildBuild": "",
      "depsBuildBuildPropagated": "",
      "depsBuildTarget": "",
      "depsBuildTargetPropagated": "",
      "depsHostHost": "",
      "depsHostHostPropagated": "",
      "depsTargetTarget": "",
      "depsTargetTargetPropagated": "",
      "doCheck": "",
      "doInstallCheck": "",
      "installPhase": "PREFIX=$out bash install.sh\n",
      "name": "systemd-failmsg-1.3",
      "nativeBuildInputs": "/nix/store/qcalxj277ld4jiklmd2lzx6gkcvkc67k-bash-5.1-p16-dev",
      "out": "/nix/store/bxj557f73c5k8ayirj30qlh4c4lvwc5w-systemd-failmsg-1.3",
      "outputs": "out",
      "patches": "",
      "pname": "systemd-failmsg",
      "propagatedBuildInputs": "",
      "propagatedNativeBuildInputs": "",
      "src": "/nix/store/ig1jp44jq5vy1lmdkm6ilimhq96v6157-source",
      "stdenv": "/nix/store/28hqpbwpzvpff7ldbhxdhzcpdc34lgsa-stdenv-linux",
      "strictDeps": "",
      "system": "x86_64-linux",
      "version": "1.3"
    }
  }
}

And finally, I can add my package to the machine's configuration. The question is... how? Again, this would be easier if I could pass stuff from flake.nix into child modules. Doing a bit more research, it is indeed possible, with the specialArgs attribute, which is a set that is merged with the usual parameters that are passed to imported modules. I figured out that the rec keyword means "recursive" from the NixOS manual, and it allows references to other attributes within an attribute set, so I can include my flake's packages as an argument to the other modules:

{
  outputs = { self, nixpkgs, sops-nix }@inputs:
    let
      forAllSystems = f: nixpkgs.lib.genAttrs nixpkgs.lib.systems.flakeExposed (system: f system);
    {
      packages = forAllSystems (system: import ./pkgs {
        pkgs = import nixpkgs { inherit system; };
      });

      nixosConfigurations."vm" = nixpkgs.lib.nixosSystem rec {
        system = "x86_64-linux";
        # Merge all flake inputs with the packages
        specialArgs = inputs // { mypkgs = packages.${system}; };
        modules = [ ./vm.nix ];
      };
    };
}

And add it to my system's configuration:

{ mypkgs, ... }:

{
  imports = [
    /* ... */
  ];

  environment.systemPackages = with mypkgs; [
    systemd-failmsg
  ];
}

Here's the tree of the package:

$ tree /nix/store/...-systemd-failmsg-1.3/
├── bin
│   └── failmsg.sh
├── lib
│   └── systemd
│       └── system
│           ├── failmsg@.service
│           ├── failmsg@.service.d
│           │   └── toplevel-override.conf
│           └── service.d
│               └── toplevel-override.conf
└── share
    └── systemd-failmsg
        ├── always-fails.service
        └── doc
            ├── LICENSE
            └── README.md

I can rebuild and run the VM successfully, but the systemd services aren't available. I do see the script in bin/failmsg.sh, but nothing else.

Instead of using the provided install script, I'll try to do things the NixOS way. My initial alternative approach involved creating a Nix module for systemd-failmsg, following the pattern used by other packages/modules that provide systemd services, e.g. nginx. I tried to create the systemd unit file with systemd.service.<service> = , and the number of config files for the overrides using environment.etc. This, however, produced a number of "Permission denied" errors during building the env package. Seemingly the etc/systemd/system path was already owned by root at the respective part of the build, was it already locked down and considered read-only? I didn't have any luck searching for this issue, but I assume this is an un(der)documented special case for systemd and other core system packages.

I eventually discovered the correct solution while browsing existing examples and issues. I was first directed to /run/current-system/sw, where I found my service under /lib/systemd/system as I expected. This explains why /lib doesn't exist - it's instead stashed away in here. It also shows that Nix does intelligently install the systemd service and the configuration files. The service did not however show up in systemctl list-units, suggesting I was missing another step.

This step was adding the package to the systemd.packages list, which according to the manual "enables" the service. After doing this, my service and its systemd overrides show up under /etc/systemd, and are all picked up correctly.

In the end I was left with this fairly short module:

{ config, lib, mypkgs, ... }:

with lib;

let
  cfg = config.services.systemd-failmsg;
in
{
  options = {
    services.systemd-failmsg.enable = mkEnableOption "systemd-failmsg";
  };

  config = mkIf cfg.enable {
    environment.systemPackages = [ mypkgs.systemd-failmsg ];
    systemd.packages = [ mypkgs.systemd-failmsg ];
  };
}

And enabling the service instead involved adding the following somewhere in my system's configuration:

services.systemd-failmsg.enable = true;

Conveniently, some Nextcloud services were failing on boot, which gave me a chance to test the failure message easily. Despite being able to execute the script manually successfully, when executed by systemd it couldn't find the failmsg.sh script. I found the substituteInPlace command in the manual and applied a quick fix to the derivation:

postBuild = ''
  substituteInPlace failmsg@.service \
    --replace /usr/bin/failmsg.sh $out/bin/failmsg.sh
'';

The systemd service still failed, this time because it couldn't find the commands used, including hostname, id, sendmail. A bit of digging around issues and the manual showed that I needed to set up the path correctly - makes sense, considering all the magic Nix does with $PATH.

I was initially considering using the substitueInPlace command again to replace them with the appropriate executable path, e.g. --replace hostname ${coreutils.id}/bin/hostname, but that didn't feel right. Looking at some more example and the manual, I found the writeShellApplication builder. This is more meant for packages wherein the only output is a single shell script, but it automagically handles the script's $PATH for you, so the script can use the commands like normal instead of absolute paths to the Nix store.

In the end I saw wrapProgram in some examples, which as the name suggests, wraps a program, setting environment variables as desired. I applied this to the failmsg.sh script in the derivation, adding the required dependencies to $PATH:

let
    inputs = [ inetutils system-sendmail coreutils ];
in

# ...

  nativeBuildInputs = [ makeWrapper ];
  buildInputs = [ bash ] ++ inputs;

  installPhase = ''
    PREFIX=$out bash install.sh

    wrapProgram $out/bin/failmsg.sh \
      --prefix PATH : ${lib.makeBinPath inputs}
  '';

(Question for the future: what differentiates nativeBuildInputs, buildInputs, and the like. Which are build-time only, which are runtime? What does "native" mean?)

This resolved the issue with unavailable commands, but the service was still failing due to network unavailability while the service was starting up. The problem lies upstream, due to the service only containing After=network.target, so the service doesn't have a hard requirement on network availability. I applied a fairly simple fix by adding a Requires and After to the systemd unit in the Nix module:

  systemd.services."failmsg@".unitConfig = {
    After = "network-online.target";
    Requires = "network-online.target";
  };

(TODO: should I submit this patch upstream?)

Finally, the service is working as expected, and I'm content with its Nix implementation.

Passing nix flake check

I came across the nix flake check command, which runs tests on your flake. So I thought it would be a good idea to ensure my flake passes.

$ nix flake check
error: Package ‘systemd-failmsg-1.3’ in /nix/store/...-source/nix/pkgs/systemd-failmsg/default.nix:22 is not supported on ‘x86_64-darwin’, refusing to evaluate.

The first problem I hit was making my systemd-failmsg package only available on Linux, otherwise the check produces the following error:

It was surprisingly difficult to find the appropriate method for "only create this attribute if this condition holds". After a while I eventually remembered coming across the pattern of extending with an attribute set which may be empty depending on the condition. A simple enough function which is probably just implemented like so:

c: set: if c then set else {}

It took a surprising amount of time to come across the appropriate method. In the end I came across the pattern in the nixpkgs manual, using the function lib.attrsets.optionalAttrs. Underneath it was the handy type specification in the lovely format I know from Haskell:

optionalAttrs :: Bool -> AttrSet

After discovering the NixOS options search I can't help but wish there were a Hoogle equivalent for Nix/nixpkgs library functions.

Back to the problem at hand, I try using the condition pkgs.stdenv.isLinux, but this causes a strange error on nix flake check:

error: attribute 'busybox' missing

Adding --add-trace, I eventually realised that at the very start of the trace, the system causing the error is actually mipsel-linux:

 … while checking the derivation 'packages.mipsel-linux.systemd-failmsg'

 at /nix/store/cl9c0n6wlgga14mcqkv7hypc6djdab00-source/nix/pkgs/default.nix:4:3:

      3|   // pkgs.lib.optionalAttrs pkgs.stdenv.isLinux {
      4|   systemd-failmsg = pkgs.callPackage ./systemd-failmsg.nix { };
       |   ^
      5| }

… while checking flake output 'packages'

at /nix/store/3hnh67ga9pw39j495b94h4fwhgqscm37-source/nix/flake.nix:13:7:

    12|     in rec {
    13|       packages = forAllSystems (system: import ./pkgs {
      |       ^
    14|         pkgs = import nixpkgs { inherit system; };

This system appears to currently be broken, so I'll try specifically filtering out mipsel-linux.

{ pkgs }:
{ }
  // pkgs.lib.optionalAttrs (pkgs.stdenv.isLinux && pkgs.system != "mipsel-linux") {
  systemd-failmsg = pkgs.callPackage ./systemd-failmsg { };
}

And nix flake check passes!

After coming across a common flake.nix pattern, I decided to invert this system filtering, and instead only support a subset of platforms in my flake:

{
outputs = { self, nixpkgs, sops-nix }@inputs:
  let
    # Specify the list of supported platforms
    systems = [ "x86_64-linux" ];
    forAllSystems = f: nixpkgs.lib.genAttrs systems (system: f system);
  in
  rec {
    packages = forAllSystems (system: import ./pkgs {
      pkgs = import nixpkgs { inherit system; };
    });
  };
}

Considering only the platforms I care about seems like an easier way to go about this, rather than trying to support all the platforms Nix does. If I want to deploy to other platforms in the future I can simply add to systems.

Overlays: fixing nextcloud-news-updater

Nextcloud's news-updater is a handy tool for speedily updating RSS feeds. It unfortunately hard-codes running the occ command under the Nextcloud installation directory. However, NixOS has its own nextcloud-occ script which conveniently wraps the normal command with sudo and the normal Nix stuff.

I generally followed examples on Flake wiki page and managed to get it patched via an overlay.

Worth noting I spent a long time trying to debug my first attempt, as nix flake check would result in "error: infinite recursion encountered" in fixed-point.nix. It took me a while to find the issue was specifying the arguments as a set { self, super }: instead of two separate arguments self: super:. And this is why types and type checking is useful. That, and useful errors.

Packaging Firefox Syncserver

This was easily the most painful part of the conversion process.

At the time of writing, there's no official package for this Python app in nixpkgs, because it's stuck on Python 2 and its dependencies have been broken. The new implementation syncstorage-rs was recently packaged, however,I wanted to stick with the old version I was already using. Also, the new app only supports MySQL and I didn't want to run yet another database engine on top of PostgreSQL, when a SQLite database would do just fine.

Unfortunately there's no "official" tool for easily creating a Nix derivation for a Python packages, with all its dependencies nicely bundled up.

I took a look at a few tools to do just that, with mixed success.

Ideally what I want is a tool that I provide a package name (and maybe version), which then pulls down all its dependencies and generates Nix derivations for them as necessary, effectively pinning them in the Nix fashion. I wouldn't mind if this it autogenerated code for me, I just don't want to write it all myself.

pypi2nix

Searching through older Nix guides and discussion, pypi2nix was brought up as an easy way to import existing Python packages. However, the project's repository now is marked as archived, and it was indeed marked as broken in nixpkgs. I gave up on it quite quickly, looking for other suggested alternatives.

mach-nix

From the surface, it seems like a sensible system: you supply a list of requirements, just as in your average Python project, and dependency resolution is managed for you.

However, I quickly ran into trouble with Python 2 support, with many packages showing "not supported for interpreter python2.7" errors. mach-nix isn't able to handle evaluation errors due to packages marked as broken, which appears to be more of a problem with the Nix language. This apparently makes it hard for mach-nix to do its dependency resolution without running into evaluation errors for Python 2 packages.

I tried manually editing mach-nix to use tryEval in a few more places, but I couldn't reliably fix the problem. Local testing in the nix repl seemed to reliably prevent errors when evaluating derivations directly, but it failed to catch evaluation errors for a derivation's dependencies. It also wasn't easy to get a list of all of a derivations dependencies (see an attempt here), which made trying to fix this even more tedious.

At this point I also realised the underlying method mach-nix uses to make things "pure" as Nix requires: fetching a Git repository which indexes everything available in PyPi and other Python repositories. This index currently sits at over 400MB, which I'm sure will only grow with time. Personally, I felt this is a sledgehammer approach. I appreciate Nix already isn't particularly conservative when it comes to disk usage, but this is extreme.

poetry2nix

I initially thought poetry2nix was only for packages already using Poetry, so I overlooked it in favour of trying out other Python-to-Nix tools. Reading a bit more into it, I realised I could repackage syncserver myself.

I add python3Packages.poetry to my dev shell, and go through the poetry init process, requiring python 2.7. But when trying to poetry add some dependencies, it complains about the current Python version not being 2.7. It's a bit confusing that a build tool needs to run on the version the end application uses, but I suppose it may need to evaluate the setup.py script. This initially wasn't easy, poetry2nix doesn't officially support Python 2 because poetry is marked as broken for Python 2 in nixpkgs, due to

After running poetry env use python2.7, it complains about my current version 2.7.xx not matching 2.7. Of course, I need to make the version requirement fuzzy by instead specifying ^2.7. After the adjustment, poetry env use python2.7 works.

poetry add got a bit further now, but started complaining about the explicit version requirements conflicting with transitive dependencies. To my horror, it turns out pip doesn't do proper dependency resolution, apparently it simply accepts the first version of a package that is requested. Naturally, Poetry doesn't provide an escape-hatch for this situation, making it a pain to port packages relying on this broken pip behaviour to Poetry. My solution was to provide to include and patch a nested copy of the broken dependency. This could probably be replaced with the usual Nix source patching mechanism along with a poetry2nix override.

Before working out how poetry2nix overrides worked, I tried manually adjusting the poetry.lock file to change the source of another broken dependency, umemcache, which has unreleased fixes.

Annoyingly, I hit a poetry bug which interfered with poetry2nix, where dependency metadata wasn't being specified, which is what poetry2nix relies on to make all the dependency fetching pure. I fixed this with a nix flake update to nixpkgs. Unfortunately this introduced more issues, including an infinite recursion error, and a large number of packages requiring setuptools to be added to their dependencies. I managed to work around both with a tedious number of poetry2nix overrides.

Another limitation I hit was Poetry not locking URL dependencies, causing confusing errors about hashes being required when building with poetry2nix. The fix is to override the src of the offending packages, repeating the URL and providing the correct hash.

After this, I finally had a build-able package, I then used the dependencyEnv attribute to get a store path containing all the dependencies, including Gunicorn which I needed my service to execute. I hit a collision between some files, so I had to set ignoreCollisions = true;:

Collision between backports-functools-lru-cache and python2.7-configparser: python2.7/site-packages/backports/__init__.pyc

Configuring the service

From there, I amended my NixOS module for firefox-syncserver until it was up and running. Some fun problems:

Nix: confusing errors when generating the config file with format.generate and recursiveUpdate. I've forgotten the details of what caused this.

Gunicorn: fun trial and error with the systemd service hardening, additionally needed the @chown and @setuid permissions, for changing the owner of a file in /tmp, and setting the user for worker processes respectively.

Gunicorn: having no logging by default! And making it tedious to configure logging.

Syncserver: forgetting the https:// in the public_url config option causes the annoying mismatch between public_url and the origin URL received by Gunicorn. The error message also unhelpfully duplicates the received URL, causing me to confuse the issue for a bug in Nginx or Gunicorn itself.

My website

I use a fairly simple deployment method for my website: $ git push with a post-receive hook on the receiving end, which rebuilds the website using Zola (and Graphviz for some diagrams).

This introduces a problem to the Nix system configuration for the webserver: cloning a Git repository and updating it separately is an impure operation. It doesn't make much sense to wrap the website in a Nix package if I want to be able to painlessly update the website separately to the system it runs on. However, I would like the NixOS configuration to include the initial setup of the repository (cloning it, and setting up that post-receive hook).

While searching for potential methods of implement this, I came across system.activationScripts in other people's configurations. This seems to fit the bill - it allows you to specify arbitrary scripts that run on boot and nixos-rebuild. I created a script that clones the repository to an appropriate location, and installs a wrapped version of the post-receive hook.

Deploying

I used nixos-infect, as a cloud-init script when allocating a VM on Hetzner (with Ubuntu as the base OS). It works perfectly, and after a reboot the new VM claims it is NixOS:

$ cat /etc/os-release
BUG_REPORT_URL="https://github.com/NixOS/nixpkgs/issues"
BUILD_ID="22.05.3475.ed9b904c5eb"
DOCUMENTATION_URL="https://nixos.org/learn.html"
HOME_URL="https://nixos.org/"
ID=nixos
LOGO="nix-snowflake"
NAME=NixOS
PRETTY_NAME="NixOS 22.05 (Quokka)"
SUPPORT_URL="https://nixos.org/community.html"
VERSION="22.05 (Quokka)"
VERSION_CODENAME=quokka
VERSION_ID="22.05"

Then I attempted to deploy my system configuration with nixos-rebuild:

$ nixos-rebuild nixos-rebuild switch --target-host atlas2 --flake .#webserver
error:
       Failed assertions:
       - The ‘fileSystems’ option does not specify your root file system.
       - You must set the option ‘boot.loader.grub.devices’ or 'boot.loader.grub.mirroredBoots' to make the system bootable.

Probably missing the /etc/nixos/hardware-configuration.nix. Copied it from the VM and included it in the system config.

Another error:

error:
       Failed assertions:
       - You must set the option ‘boot.loader.grub.devices’ or 'boot.loader.grub.mirroredBoots' to make the system bootable.

Weird that nixos-infect doesn't sort out the bootloader for you. I manually set to boot.loader.grub.device = '/dev/sda'; and that sorted that.

The next deploy worked, but there were a couple of errors, because I forgot to update the sops keys.

I also messed up and set PermitRootLogin no in the SSH config without setting up a user with sudo access, so I rolled back over the existing SSH connection:

$ nixos-rebuild switch --rollback

Glad to see that switching works without rebooting! It even disables & stops services, deletes groups, and all that stuff.

While sorting this out, it was annoying having to swap my local SSH config back-and-forth between root and a normal user.

Found the --use-remote-sudo option for nixos-rebuild, but this seems to require passwordless sudo. I set that up with the following options:

users.users.<user>.extraGroups = [ "wheel" ];
security.sudo.wheelNeedsPassword = false;

Re-did the deploy with --use-remote-sudo and it had the same result, so I'm taking it as done.

Some strange errors on the next rebuild:

error: cannot add path '/nix/store/1rhrxs9n736a4f3gqlqmi211xnhi10ka-nextcloud-config.php' because it lacks a valid signature

From this issue, it appears to be caused by nixos-rebuild copying things from my local store to the server's store - by default copying is apparently only allowed from cache.nixos.org.

At this point I messed up the SSH and sudo permissions and was locked out of root. I also didn't have a root password, so I went through Hetzner rescue. The lack of a normal $PATH made it somewhat tedious to run the commands needed, including bash and passwd. I had to search the store for them.

The "valid signature" issue turned out to be due to swapping to the normal user with --use-remote-sudo. Apparently one needs to tweak nix.settings.trusted-users to include any users that are able to nixos-rebuild. To resolve this, I added @wheel to this setting, and ran nixos-rebuild from the machine itself to work around the issue. I added this to the wiki page to highlight the problem before it occurs.

Performing the migration

At this point I had a fully functional NixOS system, reproducing just about everything I needed from the old server! As a temporary measure, I replicated my normal website DNS entries but adding a prefix: new.williamvds.me - this was quite easy since I'd made the domain name a Nix option.

I followed some standard data migration procedures, copying over my Nextcloud data and PostgreSQL database. Things generally "just worked", though I noticed a few odd things that needed minor adjustments to options.

Some testing ensured, I made sure all the new services worked by logging into them on my devices. After some more hassle with firefox-syncserver, I was fairly confident all was in order, so I migrated across by editing the normal DNS entries to point to the new server's IP address.