Overview
In this short guide, we will set up iocaine from scratch, using its built-in handler. Despite its simplicity, the built-in handler is still powerful, and in this author’s experience, will route the vast majority of unwanted visitors into iocaine’s maze.
We will not be writing a custom request handler, that will be a separate guide.
Important
On these pages, we’ll document the process of setting up iocaine running directly on the host, with systemd. If you want to run iocaine in a container, see the Getting Started with iocaine & containers guide. If you’re on Debian (or a derivative), see the Debian guide. If you’re using NixOS, there’s a guide for that, too.
Requirements
The requirements below are for this guide, not necessarily for iocaine itself. It can be used with reverse proxies other than Caddy, and works on Linux systems without systemd, and even on non-Linux systems like FreeBSD. A lot of this guide will apply to systems that fall outside of what’s documented here, but some parts will have to be adapted.
- An
x86_64oraarch64Linux system with systemd - Caddy as the reverse proxy.
Throughout this guide, we’ll be using /opt/iocaine: we’ll download a pre-built binary there, place our custom configuration under it, and so on. To make things easier, the following layout is what this guide will assume going forward:
# tree -d /opt/iocaine
/opt/iocaine
├── bin
├── etc
│ ├── iocaine
│ │ └── config.d
│ └── systemd
├── share
│ └── data
└── var
└── lib
└── iocaine
Lets create those directories first!
# mkdir -p /opt/iocaine/bin \
/opt/iocaine/etc/iocaine/config.d /opt/iocaine/etc/systemd \
/opt/iocaine/share/data \
/opt/iocaine/var/lib/iocaine
Getting familiar with iocaine
Before we begin our journey of configuring iocaine, let us take a moment to see what it can do out of the box, without any configuration whatsoever. We’ll download a prebuilt binary into /opt/iocaine/bin, and see what it can do.
Lets grab that binary! Assuming we’re on an x86_64 or aarch64 Linux system, we can do that like this:
# curl -sL \
https://iocaine.madhouse-project.org/_/v3.2.0/tarball/$(uname -m)-linux \
| tar -C /opt/iocaine/bin --zstd -x
Lets start it up! It will print a warning message, and keep running. We’ll talk more about that warning a little later.
# /opt/iocaine/bin/iocaine
2025-12-23T11:18:00.534604Z WARN iocaine::user: No ai-robots-txt-path configured, using default
2025-12-23T11:18:00.537044Z WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
We can get more logs out of it by setting the RUST_LOG environment variable to iocaine=trace (the supported log levels are, in order of decreasing verboseness: trace, debug, info, warn, and error):
# RUST_LOG=iocaine=trace /opt/iocaine/bin/iocaine
2025-12-23T11:18:33.666834Z DEBUG iocaine::sex_dungeon::means_of_production: using the embedded handler
2025-12-23T11:18:33.669592Z TRACE iocaine::sex_dungeon::means_of_production: compiling init
2025-12-23T11:18:33.748482Z TRACE iocaine::sex_dungeon::means_of_production: compilation finished
2025-12-23T11:18:33.748542Z TRACE iocaine::sex_dungeon::means_of_production: running init
2025-12-23T11:18:33.748567Z DEBUG iocaine::user: Registering metrics
2025-12-23T11:18:33.748712Z WARN iocaine::user: No ai-robots-txt-path configured, using default
2025-12-23T11:18:33.751172Z INFO iocaine::user: using default unwanted asns
2025-12-23T11:18:33.751228Z WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
2025-12-23T11:18:33.786535Z DEBUG iocaine::user: Loading embedded HTML template
2025-12-23T11:18:33.786599Z DEBUG iocaine::user: Initializing template engine
2025-12-23T11:18:33.786647Z INFO iocaine::user: poison-id: D-1vTTGPXBeG8dH4Shj3_A
2025-12-23T11:18:33.786782Z TRACE iocaine::sex_dungeon::means_of_production: init finished
2025-12-23T11:18:33.788322Z TRACE iocaine::sex_dungeon::means_of_production: compiling the main script
2025-12-23T11:18:33.814856Z TRACE iocaine::sex_dungeon::means_of_production: compilation finished
2025-12-23T11:18:33.815439Z INFO iocaine::morgue: starting iocaine
2025-12-23T11:18:33.815655Z INFO iocaine::morgue: iocaine ready
We can do all kinds of fancy stuff with logging, like set the log level to trace for iocaine::user messages, to info for iocaine::morgue, and to warn for anything else under iocaine:
# RUST_LOG=iocaine=warn,iocaine::morgue=info,iocaine::user=trace /opt/iocaine/bin/iocaine
2025-12-23T11:19:04.426970Z DEBUG iocaine::user: Registering metrics
2025-12-23T11:19:04.427307Z WARN iocaine::user: No ai-robots-txt-path configured, using default
2025-12-23T11:19:04.429683Z INFO iocaine::user: using default unwanted asns
2025-12-23T11:19:04.429725Z WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
2025-12-23T11:19:04.465190Z DEBUG iocaine::user: Loading embedded HTML template
2025-12-23T11:19:04.465330Z DEBUG iocaine::user: Initializing template engine
2025-12-23T11:19:04.465428Z INFO iocaine::user: poison-id: D-1vTTGPXBeG8dH4Shj3_A
2025-12-23T11:19:04.494161Z INFO iocaine::morgue: starting iocaine
2025-12-23T11:19:04.494363Z INFO iocaine::morgue: iocaine ready
Okay, that was an educational side quest, but: does it work? We can curl that (in another terminal)!
# curl -is http://127.0.0.1:42069/
HTTP/1.1 421 Misdirected Request
content-length: 0
date: Tue, 23 Dec 2025 11:19:27 GMT
Why that port? Because that’s where iocaine binds to by default! Lets check its config:
# /opt/iocaine/bin/iocaine show config
initial-seed ""
state-directory ""
http-server default {
bind "127.0.0.1:42069"
use handler-from=default
}
declare-handler default language=roto
The “421 Misdirected Request” response is a signal that real contents should be served. We can also test what happens if we send it a request it deems garbage:
# curl -Is http://127.0.0.1:42069/ -A Perplexity
HTTP/1.1 200 OK
content-type: text/html
content-length: 2698
date: Tue, 23 Dec 2025 11:20:00 GMT
We sent a HEAD request this time, with curl -I, but only because this author did not want to paste a lot of garbage into this guide. Replace the -I with -i, or drop it entirely, and marvel at the unintelligible junk iocaine generates out of its own source code!
Adjusting the configuration
This is all fine and great, and will stop a lot of the crawlers, there’s this warning about ai-robots-txt-path. You see, iocaine ships with a copy of ai.robots.txt’s robots.json, to ward off crawlers that identify themselves. But iocaine’s copy is only updated when a new iocaine release is cut - we may wish to update it more often than that.
To do so, lets grab the most recent copy of it, directly from ai.robots.txt’s main branch:
# curl -L https://github.com/ai-robots-txt/ai.robots.txt/raw/refs/heads/main/robots.json \
-o /opt/iocaine/share/data/ai.robots.txt-robots.json
Previously, we’ve seen a warning about ai-robots-txt-path not being configured. Now that we have a copy of this file, lets tell iocaine about it. The way to do that is through partial configuration snippets: we can place files with partial configuration into a directory, tell iocaine about said directory, and it will merge them all. If we do not give iocaine a configuration file to load, it will use its embedded default. We can use this to our advantage, and extend the default configuration!
Lets have a look at that default again!
# /opt/iocaine/bin/iocaine show config
initial-seed ""
state-directory ""
http-server default {
bind "127.0.0.1:42069"
use handler-from=default
}
declare-handler default language=roto
It’s the “handler” we need to apply configuration to. Doing so is simple:
# cat >/opt/iocaine/etc/iocaine/config.d/00-ai.robots.txt.kdl <<EOF
declare-handler default {
ai-robots-txt-path "/opt/iocaine/share/data/ai.robots.txt-robots.json"
}
EOF
We placed a configuration snippet into /opt/iocaine/etc/iocaine/config.d/00-ai.robots.txt.kdl that tells the script where to find the list we just downloaded. On its own, iocaine has no idea where to look for configuration, so we have to tell it. Before we go ahead and run it, lets look at the merged configuration:
# /opt/iocaine/bin/iocaine -c /opt/iocaine/etc/iocaine/config.d show config
initial-seed ""
state-directory ""
http-server default {
bind "127.0.0.1:42069"
use handler-from=default
}
declare-handler default language=roto {
ai-robots-txt-path "/opt/iocaine/share/data/ai.robots.txt-robots.json"
}
It picked up our configuration! Lets see if it worked:
# RUST_LOG=iocaine=info,iocaine::user=trace /opt/iocaine/bin/iocaine -c /opt/iocaine/etc/iocaine/config.d
2025-12-23T11:22:33.200951Z DEBUG iocaine::user: Registering metrics
2025-12-23T11:22:33.201076Z DEBUG iocaine::user: Loading ai-robots-txt from /opt/iocaine/share/data/ai.robots.txt-robots.json
2025-12-23T11:22:33.203508Z INFO iocaine::user: using default unwanted asns
2025-12-23T11:22:33.203556Z WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
2025-12-23T11:22:33.240064Z DEBUG iocaine::user: Loading embedded HTML template
2025-12-23T11:22:33.240170Z DEBUG iocaine::user: Initializing template engine
2025-12-23T11:22:33.240236Z INFO iocaine::user: poison-id: D-1vTTGPXBeG8dH4Shj3_A
2025-12-23T11:22:33.269266Z INFO iocaine::morgue: starting iocaine
2025-12-23T11:22:33.269466Z INFO iocaine::morgue: iocaine ready
With this set up, we can send any of the bots listed in ai.robots.txt into our infinite maze of Rusty garbage!
# curl -Is http://127.0.0.1:42069/ -A ClaudeBot
HTTP/1.1 200 OK
content-type: text/html
content-length: 2698
date: Tue, 23 Dec 2025 11:22:53 GMT
There are a whole lot of options the built-in handler supports - we’re not going to repeat all of them here, only some of the more important ones, like the ai-robots-txt-path option we’ve just played with.
A better corpus
Out of the box, iocaine will use its own source code as its corpus. While that does end up generating completely nonsensical garbage, it’s not a particularly big, nor varied corpus. We could be using something better. How about Orwell’s 1984, combined with Huxley’s Brave New World?
Lets grab a copy of these from archive.org!
# curl -L https://archive.org/download/GeorgeOrwells1984/1984_djvu.txt \
-o /opt/iocaine/share/data/1984.txt
# curl -L https://archive.org/download/ost-english-brave_new_world_aldous_huxley/Brave_New_World_Aldous_Huxley_djvu.txt \
-o /opt/iocaine/share/data/brave-new-world.txt
We could use the above books as our wordlist too, but we’ll get a larger set of words out of a wordlist collection. We’re gonna grab one from miscfiles:
# curl -L https://git.savannah.gnu.org/cgit/miscfiles.git/plain/web2 \
-o /opt/iocaine/share/data/words.txt
We’ll need to tell iocaine to use these, too, so lets drop another configuration snippet into, say, /opt/iocaine/etc/iocaine/config.d/01-sources.kdl:
# cat >/opt/iocaine/etc/iocaine/config.d/01-sources.kdl <<EOF
declare-handler default {
sources {
training-corpus "/opt/iocaine/share/data/1984.txt" \
"/opt/iocaine/share/data/brave-new-world.txt"
wordlists "/opt/iocaine/share/data/words.txt"
}
}
EOF
If we restart iocaine now, the generated garbage will be far less rusty now.
Observing the Crawlers
The built-in handler supports metrics too, and if a prometheus-server is configured, it will make a number of metrics available (along with the usual process metrics):
qmk_requests{host}The number of requests served, keyed by host.
qmk_ruleset_hits{ruleset, outcome}Number of times a particular rule was hit, and its outcome. The outcome is either
garbageordefault, and the rulesets areai.robots.txt,major-browsers,unwanted-visitors, ordefault.qmk_garbage_generated{host}Amount of garbage generated, in bytes, keyed by host.
We like big numbers and pretty graphs, so while there is no example dashboard (yet), we can still enable a Prometheus server, and start collecting! We’ll also tell iocaine to persist these metrics, so that we don’t start from zero every time iocaine is restarted.
Lets drop the following configuration snippet into /opt/iocaine/etc/iocaine/config.d/02-metrics.kdl:
# cat >/opt/iocaine/etc/iocaine/config.d/02-metrics.kdl <<EOF
prometheus-server metrics {
bind "127.0.0.1:42042"
persist-path "/opt/iocaine/var/lib/iocaine/qmk-metrics.json"
persist-interval "1h"
}
http-server default {
use metrics=metrics
}
EOF
Once we restart iocaine, the metrics will be available immediately at http://127.0.0.1:42042/metrics:
# curl -s http://127.0.0.1:42042/metrics | grep '^iocaine_version'
iocaine_version{version="3.1.0"} 1
The metrics mentioned above will appear in this listing as soon as iocaine has seen some traffic. Lets give it some, and check the metrics!
# curl -s http://127.0.0.1:42069/ >/dev/null
# curl -s http://127.0.0.1:42069/ -A ClaudeBot >/dev/null
# curl -s http://127.0.0.1:42069/ -A Perplexity >/dev/null
# curl -s http://127.0.0.1:42069/ -A "Mozilla/5.0 Firefox/0" >/dev/null
# curl -s http://127.0.0.1:42042/metrics | grep '^qmk_'
qmk_garbage_generated{host="127.0.0.1:42069"} 5370
qmk_requests{host="127.0.0.1:42069"} 4
qmk_ruleset_hits{outcome="default",ruleset="default"} 1
qmk_ruleset_hits{outcome="garbage",ruleset="ai.robots.txt"} 1
qmk_ruleset_hits{outcome="garbage",ruleset="major-browsers"} 1
qmk_ruleset_hits{outcome="garbage",ruleset="unwanted-visitors"} 1
Setting up the service
Our iocaine has a decent training corpus now, uses ai.robots.txt’s robot list, and even has metrics enabled - it is time to stop running it by hand, and set up a systemd service. If you’re on a BSD, or a Linux distribution that does not use systemd, you’ll have to figure out how your init system of choice can manage iocaine, this section will not be of much use.
For the rest of us, there’s a good template in the sources we can base ours own. Let me show you a magic trick!
# /opt/iocaine/bin/iocaine show embeds -c /iocaine.service \
| tee /opt/iocaine/etc/systemd/iocaine.service
We can enable it with systemctl directly:
# systemctl enable /opt/iocaine/etc/systemd/iocaine.service
Created symlink '/etc/systemd/system/iocaine.service' → '/opt/iocaine/etc/systemd/iocaine.service'.
Created symlink '/etc/systemd/system/multi-user.target.wants/iocaine.service' → '/opt/iocaine/etc/systemd/iocaine.service'.
The service file, as extracted, is not suitable for our needs, as it hardcodes different paths than what we’ll be using. Thankfully, enabling the service doesn’t start it, we can adjust it first. Like iocaine, systemd supports partial unit files too, and we can use systemctl edit iocaine.service to easily edit it!
# systemctl edit --stdin iocaine <<EOF
[Service]
ExecStart=
ExecStart=/opt/iocaine/bin/iocaine --config-path /opt/iocaine/etc/iocaine/config.d start
EOF
Successfully installed edited file '/etc/systemd/system/iocaine.service.d/override.conf'.
We have a slight problem now, though: we initially configured our metrics to persist to a directory under /opt/iocaine - but systemd creates a dynamic user for iocaine, and it would really be in our best interest to save metrics to a file that is easy to write to under these circumstances. Lets change the iocaine configuration!
We have two options here: we can either edit /opt/iocaine/etc/iocaine/config.d/02-metrics.kdl, and change persist-path to /var/lib/iocaine/qmk-metrics.json, or we can place an additional snippet into /opt/iocaine/etc/iocaine/config.d/03-systemd.kdl. For this guide, we’ll do the latter, and drop an override into said file:
# cat >/opt/iocaine/etc/iocaine/config.d/03-systemd.kdl <<EOF
prometheus-server metrics {
persist-path "qmk-metrics.json"
}
EOF
The path is simply the filename, because the service ensures iocaine is running with /var/lib/iocaine as its working directory. We’re not going to copy over our persisted metrics.
With both iocaine and its systemd service configured, we can start it up:
# systemctl start iocaine
Nothing extraordinary here, this is how you set up any systemd service afterall. Lets see if it works!
# systemctl status iocaine
● iocaine.service - iocaine, the deadliest poison known to AI
Loaded: loaded (/etc/systemd/system/iocaine.service; enabled; preset: enabled)
Drop-In: /etc/systemd/system/iocaine.service.d
└─override.conf
Active: active (running) since Tue 2025-12-23 11:28:27 UTC; 11s ago
Invocation: c724e35d37434cd8934951bbdc3e9a56
Main PID: 1224 (iocaine)
Tasks: 2 (limit: 4657)
Memory: 44.6M (peak: 47M)
CPU: 220ms
CGroup: /system.slice/iocaine.service
└─1224 /opt/iocaine/bin/iocaine --config-path /opt/iocaine/etc/iocaine/config.d start
Dec 23 11:28:27 localhost systemd[1]: Starting iocaine.service - iocaine, the deadliest poison known to AI...
Dec 23 11:28:27 localhost iocaine[1224]: 2025-12-23T11:28:27.147914Z WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
Dec 23 11:28:27 localhost systemd[1]: Started iocaine.service - iocaine, the deadliest poison known to AI.
# curl -is http://127.0.0.1:42069/
HTTP/1.1 421 Misdirected Request
content-length: 0
date: Tue, 23 Dec 2025 11:29:14 GMT
Lovely!
Caddy configuration
We’re done with the hard parts now: iocaine is up and running, managed by systemd. We even have metrics (though we haven’t set up a dashboard for them yet)! It’s time to place it in front of our stuff. For the purpose of this guide, we’ll be using Caddy. The examples shown herein are intentionally limited, as this is a getting started guide, not a Caddy mastery one.
Important
To see how to use iocaine with other reverse proxies and similar tools, see the reverse proxy guides.
Lets say we have a simple blog, a static one, and we’d prefer if the crawlers didn’t reach it. Lets roll up our sleeves, and etch these magic words into our Caddyfile:
blog.example.com {
@read method GET HEAD
reverse_proxy @read 127.0.0.1:42069 {
@fallback status 421
handle_response @fallback
}
root /var/www/blog.example.com
file_server
}
This will send GET and HEAD requests to iocaine first, and if iocaine instructs Caddy (through a HTTP 421 Misdirected Request response) to serve real contents, we’ll tell Caddy to fall back to serving static files from /var/www/blog.example.com.
If we’d like to put iocaine in front of something that isn’t a static site, which we have to reverse proxy, that is very similar too. Lets imagine a GoToSocial instance! We want to send GET and HEAD requests through iocaine, but route other methods directly to GoToSocial. Here’s one way to do just that:
sloth.example.com {
@read method GET HEAD
reverse_proxy @read 127.0.0.1:42069 {
@fallback status 421
handle_response @fallback
}
reverse_proxy 127.0.0.1:8080
}
Final remarks
But this is where this guide will end: it’s not a guide to teach you dark Caddy secrets. It’s a guide to get started with Caddy and iocaine, and we’re up and running! Look at us playing with the configuration: isn’t operations fun?
Now go, tweak it further, watch the metrics, and see the Crawlers get trapped in the maze.