Overview
In this short guide, we will set up iocaine from scratch, using its built-in handler, on NixOS, using nixocaine. Despite its simplicity, the built-in handler is still powerful, and in this author’s experience, will route the vast majority of unwanted visitors into iocaine’s maze.
We will not be writing a custom request handler, that will be a separate guide.
Important
On these pages, we’ll document the process of setting up iocaine on NixOS. If you want to run iocaine on a traditional Linux distribution, see the main Getting Started guide. If you want to run it in a container, there’s a container guide too.
Requirements
In this guide, we’re going to set up iocaine on NixOS, using flakes, because that’s what this author is familiar with. Let us then take a moment to add nixocaine to our flake inputs!
{
inputs = {
nixpkgs.url = "github:nixos/nixpkgs/nixos-25.11";
nixocaine = {
url = "https://git.madhouse-project.org/iocaine/nixocaine/archive/stable.tar.gz";
};
};
}
The flake exposes a NixOS module (inputs.nixocaine.nixosModules.default), and an overlay (inputs.nixocaine.overlays.default). The NixOS module automatically applies the overlay, too. This documentation covers the main branch of nixocaine.
For auxiliary data, such as a list of self-identifying scrapers, a better corpus to train on, and so on, we’ll use /etc/iocaine/data, and won’t be putting those in the Nix store. Doing so is left as an exercise for the reader, but some inspiration may be gained from the author’s configuration: robots.json derivation, and a package of Orwell’s 1984.
Lets create that directory!
# mkdir -p /etc/iocaine/data/corpus
Getting familiar with iocaine
Lets start simple! Assuming we already imported the inputs.nixocaine.nixosModules.default module, the services.iocaine attribute set is available. Lets enable it:
services.iocaine.enable = true;
Activate the configuration, and lets see if it works:
# systemctl status iocaine.service
● iocaine.service - iocaine
Loaded: loaded (/etc/systemd/system/iocaine.service; enabled; preset: ignored)
Active: active (running) since Fri 2025-10-24 10:49:48 UTC; 2min 12s ago
Invocation: fc4832d259da4d2894a4b5e846a0c217
Main PID: 846 (iocaine)
IP: 0B in, 0B out
IO: 0B read, 0B written
Tasks: 2 (limit: 1136)
Memory: 35.7M (peak: 36M)
CPU: 121ms
CGroup: /system.slice/iocaine.service
└─846 /nix/store/bz9hj2whvrc21i8zqq21x99dw2wn8xyw-iocaine-3.0.0-snapshot/bin/iocaine start
Oct 24 10:49:48 iocainedefault systemd[1]: Starting iocaine...
Oct 24 10:49:48 iocainedefault iocaine[846]: 2025-10-24T10:49:48.617118Z WARN iocaine::user: No ai-robots-txt-path configured!
Oct 24 10:49:48 iocainedefault systemd[1]: Started iocaine.
Does it work?
# curl http://127.0.0.1:42069/
HTTP/1.1 421 Misdirected Request
content-length: 0
date: Fri, 24 Oct 2025 10:53:33 GMT
# curl -sI http://127.0.0.1:42069/ -A Perplexity
HTTP/1.1 200 OK
content-type: text/html
content-length: 2467
date: Fri, 24 Oct 2025 10:54:08 GMT
Yes, it does. Wonderful! Now lets make it more useful.
Adjusting the configuration
Unlike other ways we’ve seen (in the Getting Started and Getting Started with containers guides), the NixOS module does not support configuration snippets when configuring through Nix attributes. It is possible to point iocaine to configuration files and directories managed outside of Nix by setting the services.iocaine.configPaths attribute. We will not be doing that in this guide. We will be configuring iocaine through attributes.
We can do that configuration through the services.iocaine.config attribute set. It follows the same structure as the serialized configuration, but in Nix syntax. Since the structure is already documented, it will not be repeated here.
Now, since we need a full configuration, and we can’t augment the default, we’ll start with declaring a configuration that matches the default, but in Nix:
services.iocaine.enable = true;
services.iocaine.config.server.default = {
bind = "127.0.0.1:42069";
mode = "http";
use.handler-from = "default";
};
services.iocaine.config.handler.default = { };
This is all fine and great, and will stop a lot of the crawlers, there’s this warning about ai-robots-txt-path. You see, iocaine ships with a copy of ai.robots.txt’s robots.json, to ward off crawlers that identify themselves. But iocaine’s copy is only updated when a new iocaine release is cut - we may wish to update it more often than that.
To do so, lets grab the most recent copy of it, directly from ai.robots.txt’s main branch:
# curl -L https://github.com/ai-robots-txt/ai.robots.txt/raw/refs/heads/main/robots.json \
-o /etc/iocaine/data/ai.robots.txt-robots.json
Previously, we’ve seen a warning about ai-robots-txt-path not being configured. Now that we have a copy of this file, lets tell iocaine about it!
services.iocaine.config.handler.default.config = {
"ai-robots-txt-path" = "/etc/iocaine/data/ai.robots.txt-robots.json";
};
After activating the new configuration, we can now test if this works:
# curl -Is http://127.0.0.1:42069/ -A ClaudeBot
HTTP/1.1 200 OK
content-type: text/html
content-length: 2467
date: Fri, 24 Oct 2025 11:13:44 GMT
There are a whole lot of options the built-in handler supports - we’re not going to repeat all of them here, only some of the more important ones, like the ai-robots-txt-path option we’ve just played with.
A better corpus
Out of the box, iocaine will use its own source code as its corpus. While that does end up generating completely nonsensical garbage, it’s not a particularly big, nor varied corpus. We could be using something better. How about Orwell’s 1984, combined with Huxley’s Brave New World?
Lets grab a copy of these from archive.org!
# curl -L https://archive.org/download/GeorgeOrwells1984/1984_djvu.txt \
-o /etc/iocaine/data/corpus/1984.txt
# curl -L https://archive.org/download/ost-english-brave_new_world_aldous_huxley/Brave_New_World_Aldous_Huxley_djvu.txt \
-o /etc/iocaine/data/corpus/brave-new-world.txt
We could use the above books as our wordlist too, but we’ll get a larger set of words out of a wordlist collection. We’re gonna grab one from miscfiles:
# curl -L https://git.savannah.gnu.org/cgit/miscfiles.git/plain/web2 \
-o /etc/iocaine/data/corpus/words.txt
Lets adjust our configuration!
services.iocaine.config.handler.default.config = {
"ai-robots-txt-path" = "/etc/iocaine/data/ai.robots.txt-robots.json";
sources = {
"training-corpus" = [
"/data/corpus/1984.txt"
"/data/corpus/brave-new-world.txt"
];
"wordlists" = [ "/data/corpus/words.txt" ];
};
};
Once we deploy the new configuration, the generated garbage will be far less rusty now.
Observing the Crawlers
The built-in handler supports metrics too, and if a prometheus-server is configured, it will make a number of metrics available (along with the usual process metrics):
qmk_requests{host}The number of requests served, keyed by host.
qmk_ruleset_hits{ruleset, outcome}Number of times a particular rule was hit, and its outcome. The outcome is either
garbageordefault, and the rulesets areai.robots.txt,major-browsers,unwanted-visitors, ordefault.qmk_garbage_generated{host}Amount of garbage generated, in bytes, keyed by host.
We like big numbers and pretty graphs, so while there is no example dashboard (yet), we can still enable a Prometheus server, and start collecting! We’ll also tell iocaine to persist these metrics, so that we don’t start from zero every time iocaine is restarted. We’ll make a few quick adjustments to our configuration:
services.iocaine.config.server.default = {
bind = "127.0.0.1:42069";
mode = "http";
use.handler-from = "default";
use.metrics = "metrics"; # <- this is a new line added to this attrset!
};
services.iocaine.config.server.metrics = {
bind = "127.0.0.1:42042";
mode = "prometheus";
persist-path = "qmk-metrics.json";
persist-interval = "1h";
};
After deploying the configuration, the metrics will be available immediately at http://127.0.0.1:42042/metrics:
# curl -s http://127.0.0.1:42042/metrics | grep '^iocaine_version'
iocaine_version{version="3.0.0-snapshot"} 1
The metrics mentioned above will appear in this listing as soon as iocaine has seen some traffic. Lets give it some, and check the metrics!
# curl -s http://127.0.0.1:42069/ >/dev/null
# curl -s http://127.0.0.1:42069/ -A ClaudeBot >/dev/null
# curl -s http://127.0.0.1:42069/ -A Perplexity >/dev/null
# curl -s http://127.0.0.1:42069/ -A "Mozilla/5.0 Firefox/0" >/dev/null
# curl -s http://127.0.0.1:42042/metrics | grep '^qmk_'
qmk_garbage_generated{host="127.0.0.1:42069"} 5370
qmk_requests{host="127.0.0.1:42069"} 4
qmk_ruleset_hits{outcome="default",ruleset="default"} 1
qmk_ruleset_hits{outcome="garbage",ruleset="ai.robots.txt"} 1
qmk_ruleset_hits{outcome="garbage",ruleset="major-browsers"} 1
qmk_ruleset_hits{outcome="garbage",ruleset="unwanted-visitors"} 1
Available attributes
For the sake of completeness, we provide a list of available attributes within the services.iocaine attrset.
services.iocaine.enable
A boolean value (defaulting to false). When enabled, the iocaine service will be enabled and started, with the configuration declared herein.
services.iocaine.package
The iocaine package to use. Defaults to pkgs.iocaine-unstable, as provided by nixocaine.
services.iocaine.environment
A set of environment variables to put into iocaine’s environment when started via systemd, empty by default.
services.iocaine.config
Configuration for iocaine, in serialized format, as Nix expressions. These will be serialized to JSON, and iocaine will be pointed to the resulting configuration file.
Use this attribute set if you wish to configure iocaine through Nix expressions. Mutually exclusive with services.iocaine.configPaths.
services.iocaine.configPaths
Configuration for iocaine, in either KDL format, or any other supported format, managed outside of Nix. The attribute requires a list of strings, paths that point to configuration files or directories.
This is mutually exclusive with services.iocaine.config, and this is the recommended way to configure iocaine if you’re managing your configuration outside of Nix.
Final remarks
Now that we have iocaine up and running on NixOS, we can integrate it with our reverse proxy of choice. Doing so is outside of the scope of this document, consult the available attributes for your choice of reverse proxy.
Now go, tweak it further, watch the metrics, and see the Crawlers get trapped in the maze.