Getting started with iocaine

Overview

In this short guide, we will set up iocaine from scratch, using its built-in handler. Despite its simplicity, the built-in handler is still powerful, and in this author’s experience, will route the vast majority of unwanted visitors into iocaine’s maze.

We will not be writing a custom request handler, that will be a separate guide.

  Important

On these pages, we’ll document the process of setting up iocaine running directly on the host, with systemd. If you want to run iocaine in a container, see the Getting Started with iocaine & containers guide. If you’re on Debian (or a derivative), see the Debian guide. If you’re using NixOS, there’s a guide for that, too.

Requirements

The requirements below are for this guide, not necessarily for iocaine itself. It can be used with reverse proxies other than Caddy, and works on Linux systems without systemd, and even on non-Linux systems like FreeBSD. A lot of this guide will apply to systems that fall outside of what’s documented here, but some parts will have to be adapted.

Throughout this guide, we’ll be using /opt/iocaine: we’ll download a pre-built binary there, place our custom configuration under it, and so on. To make things easier, the following layout is what this guide will assume going forward:

# tree -d /opt/iocaine
/opt/iocaine
├── bin
├── etc
│   ├── iocaine
│   │   └── config.d
│   └── systemd
├── share
│   └── data
└── var
    └── lib
        └── iocaine

Lets create those directories first!

# mkdir -p /opt/iocaine/bin \
           /opt/iocaine/etc/iocaine/config.d /opt/iocaine/etc/systemd \
           /opt/iocaine/share/data \
           /opt/iocaine/var/lib/iocaine

Getting familiar with iocaine

Before we begin our journey of configuring iocaine, let us take a moment to see what it can do out of the box, without any configuration whatsoever. We’ll download a prebuilt binary into /opt/iocaine/bin, and see what it can do.

Lets grab that binary! Assuming we’re on an x86_64 or aarch64 Linux system, we can do that like this:

# curl -sL \
    https://iocaine.madhouse-project.org/_/v3.2.0/tarball/$(uname -m)-linux \
  | tar -C /opt/iocaine/bin --zstd -x

Lets start it up! It will print a warning message, and keep running. We’ll talk more about that warning a little later.

# /opt/iocaine/bin/iocaine
2025-12-23T11:18:00.534604Z  WARN iocaine::user: No ai-robots-txt-path configured, using default
2025-12-23T11:18:00.537044Z  WARN iocaine::user: No unwanted-asns.db-path configured, check disabled

We can get more logs out of it by setting the RUST_LOG environment variable to iocaine=trace (the supported log levels are, in order of decreasing verboseness: trace, debug, info, warn, and error):

# RUST_LOG=iocaine=trace /opt/iocaine/bin/iocaine
2025-12-23T11:18:33.666834Z DEBUG iocaine::sex_dungeon::means_of_production: using the embedded handler
2025-12-23T11:18:33.669592Z TRACE iocaine::sex_dungeon::means_of_production: compiling init
2025-12-23T11:18:33.748482Z TRACE iocaine::sex_dungeon::means_of_production: compilation finished
2025-12-23T11:18:33.748542Z TRACE iocaine::sex_dungeon::means_of_production: running init
2025-12-23T11:18:33.748567Z DEBUG iocaine::user: Registering metrics
2025-12-23T11:18:33.748712Z  WARN iocaine::user: No ai-robots-txt-path configured, using default
2025-12-23T11:18:33.751172Z  INFO iocaine::user: using default unwanted asns
2025-12-23T11:18:33.751228Z  WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
2025-12-23T11:18:33.786535Z DEBUG iocaine::user: Loading embedded HTML template
2025-12-23T11:18:33.786599Z DEBUG iocaine::user: Initializing template engine
2025-12-23T11:18:33.786647Z  INFO iocaine::user: poison-id: D-1vTTGPXBeG8dH4Shj3_A
2025-12-23T11:18:33.786782Z TRACE iocaine::sex_dungeon::means_of_production: init finished
2025-12-23T11:18:33.788322Z TRACE iocaine::sex_dungeon::means_of_production: compiling the main script
2025-12-23T11:18:33.814856Z TRACE iocaine::sex_dungeon::means_of_production: compilation finished
2025-12-23T11:18:33.815439Z  INFO iocaine::morgue: starting iocaine
2025-12-23T11:18:33.815655Z  INFO iocaine::morgue: iocaine ready

We can do all kinds of fancy stuff with logging, like set the log level to trace for iocaine::user messages, to info for iocaine::morgue, and to warn for anything else under iocaine:

# RUST_LOG=iocaine=warn,iocaine::morgue=info,iocaine::user=trace /opt/iocaine/bin/iocaine
2025-12-23T11:19:04.426970Z DEBUG iocaine::user: Registering metrics
2025-12-23T11:19:04.427307Z  WARN iocaine::user: No ai-robots-txt-path configured, using default
2025-12-23T11:19:04.429683Z  INFO iocaine::user: using default unwanted asns
2025-12-23T11:19:04.429725Z  WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
2025-12-23T11:19:04.465190Z DEBUG iocaine::user: Loading embedded HTML template
2025-12-23T11:19:04.465330Z DEBUG iocaine::user: Initializing template engine
2025-12-23T11:19:04.465428Z  INFO iocaine::user: poison-id: D-1vTTGPXBeG8dH4Shj3_A
2025-12-23T11:19:04.494161Z  INFO iocaine::morgue: starting iocaine
2025-12-23T11:19:04.494363Z  INFO iocaine::morgue: iocaine ready

Okay, that was an educational side quest, but: does it work? We can curl that (in another terminal)!

# curl -is http://127.0.0.1:42069/
HTTP/1.1 421 Misdirected Request
content-length: 0
date: Tue, 23 Dec 2025 11:19:27 GMT

Why that port? Because that’s where iocaine binds to by default! Lets check its config:

# /opt/iocaine/bin/iocaine show config
initial-seed ""
state-directory ""
http-server default {
    bind "127.0.0.1:42069"
    use handler-from=default
}
declare-handler default language=roto

The “421 Misdirected Request” response is a signal that real contents should be served. We can also test what happens if we send it a request it deems garbage:

# curl -Is http://127.0.0.1:42069/ -A Perplexity
HTTP/1.1 200 OK
content-type: text/html
content-length: 2698
date: Tue, 23 Dec 2025 11:20:00 GMT

We sent a HEAD request this time, with curl -I, but only because this author did not want to paste a lot of garbage into this guide. Replace the -I with -i, or drop it entirely, and marvel at the unintelligible junk iocaine generates out of its own source code!

Adjusting the configuration

This is all fine and great, and will stop a lot of the crawlers, there’s this warning about ai-robots-txt-path. You see, iocaine ships with a copy of ai.robots.txt’s robots.json, to ward off crawlers that identify themselves. But iocaine’s copy is only updated when a new iocaine release is cut - we may wish to update it more often than that.

To do so, lets grab the most recent copy of it, directly from ai.robots.txt’s main branch:

# curl -L https://github.com/ai-robots-txt/ai.robots.txt/raw/refs/heads/main/robots.json \
       -o /opt/iocaine/share/data/ai.robots.txt-robots.json

Previously, we’ve seen a warning about ai-robots-txt-path not being configured. Now that we have a copy of this file, lets tell iocaine about it. The way to do that is through partial configuration snippets: we can place files with partial configuration into a directory, tell iocaine about said directory, and it will merge them all. If we do not give iocaine a configuration file to load, it will use its embedded default. We can use this to our advantage, and extend the default configuration!

Lets have a look at that default again!

# /opt/iocaine/bin/iocaine show config
initial-seed ""
state-directory ""
http-server default {
    bind "127.0.0.1:42069"
    use handler-from=default
}
declare-handler default language=roto

It’s the “handler” we need to apply configuration to. Doing so is simple:

# cat >/opt/iocaine/etc/iocaine/config.d/00-ai.robots.txt.kdl <<EOF
declare-handler default {
  ai-robots-txt-path "/opt/iocaine/share/data/ai.robots.txt-robots.json"
}
EOF

We placed a configuration snippet into /opt/iocaine/etc/iocaine/config.d/00-ai.robots.txt.kdl that tells the script where to find the list we just downloaded. On its own, iocaine has no idea where to look for configuration, so we have to tell it. Before we go ahead and run it, lets look at the merged configuration:

# /opt/iocaine/bin/iocaine -c /opt/iocaine/etc/iocaine/config.d show config
initial-seed ""
state-directory ""
http-server default {
    bind "127.0.0.1:42069"
    use handler-from=default
}
declare-handler default language=roto {
    ai-robots-txt-path "/opt/iocaine/share/data/ai.robots.txt-robots.json"
}

It picked up our configuration! Lets see if it worked:

# RUST_LOG=iocaine=info,iocaine::user=trace /opt/iocaine/bin/iocaine -c /opt/iocaine/etc/iocaine/config.d
2025-12-23T11:22:33.200951Z DEBUG iocaine::user: Registering metrics
2025-12-23T11:22:33.201076Z DEBUG iocaine::user: Loading ai-robots-txt from /opt/iocaine/share/data/ai.robots.txt-robots.json
2025-12-23T11:22:33.203508Z  INFO iocaine::user: using default unwanted asns
2025-12-23T11:22:33.203556Z  WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
2025-12-23T11:22:33.240064Z DEBUG iocaine::user: Loading embedded HTML template
2025-12-23T11:22:33.240170Z DEBUG iocaine::user: Initializing template engine
2025-12-23T11:22:33.240236Z  INFO iocaine::user: poison-id: D-1vTTGPXBeG8dH4Shj3_A
2025-12-23T11:22:33.269266Z  INFO iocaine::morgue: starting iocaine
2025-12-23T11:22:33.269466Z  INFO iocaine::morgue: iocaine ready

With this set up, we can send any of the bots listed in ai.robots.txt into our infinite maze of Rusty garbage!

# curl -Is http://127.0.0.1:42069/ -A ClaudeBot
HTTP/1.1 200 OK
content-type: text/html
content-length: 2698
date: Tue, 23 Dec 2025 11:22:53 GMT

There are a whole lot of options the built-in handler supports - we’re not going to repeat all of them here, only some of the more important ones, like the ai-robots-txt-path option we’ve just played with.

A better corpus

Out of the box, iocaine will use its own source code as its corpus. While that does end up generating completely nonsensical garbage, it’s not a particularly big, nor varied corpus. We could be using something better. How about Orwell’s 1984, combined with Huxley’s Brave New World?

Lets grab a copy of these from archive.org!

# curl -L https://archive.org/download/GeorgeOrwells1984/1984_djvu.txt \
       -o /opt/iocaine/share/data/1984.txt
# curl -L https://archive.org/download/ost-english-brave_new_world_aldous_huxley/Brave_New_World_Aldous_Huxley_djvu.txt \
       -o /opt/iocaine/share/data/brave-new-world.txt

We could use the above books as our wordlist too, but we’ll get a larger set of words out of a wordlist collection. We’re gonna grab one from miscfiles:

# curl -L https://git.savannah.gnu.org/cgit/miscfiles.git/plain/web2 \
       -o /opt/iocaine/share/data/words.txt

We’ll need to tell iocaine to use these, too, so lets drop another configuration snippet into, say, /opt/iocaine/etc/iocaine/config.d/01-sources.kdl:

# cat >/opt/iocaine/etc/iocaine/config.d/01-sources.kdl <<EOF
declare-handler default {
  sources {
    training-corpus "/opt/iocaine/share/data/1984.txt" \
                    "/opt/iocaine/share/data/brave-new-world.txt"
    wordlists "/opt/iocaine/share/data/words.txt"
  }
}
EOF

If we restart iocaine now, the generated garbage will be far less rusty now.

Observing the Crawlers

The built-in handler supports metrics too, and if a prometheus-server is configured, it will make a number of metrics available (along with the usual process metrics):

qmk_requests{host}

The number of requests served, keyed by host.

qmk_ruleset_hits{ruleset, outcome}

Number of times a particular rule was hit, and its outcome. The outcome is either garbage or default, and the rulesets are ai.robots.txt, major-browsers, unwanted-visitors, or default.

qmk_garbage_generated{host}

Amount of garbage generated, in bytes, keyed by host.

We like big numbers and pretty graphs, so while there is no example dashboard (yet), we can still enable a Prometheus server, and start collecting! We’ll also tell iocaine to persist these metrics, so that we don’t start from zero every time iocaine is restarted.

Lets drop the following configuration snippet into /opt/iocaine/etc/iocaine/config.d/02-metrics.kdl:

# cat >/opt/iocaine/etc/iocaine/config.d/02-metrics.kdl <<EOF
prometheus-server metrics {
  bind "127.0.0.1:42042"
  persist-path "/opt/iocaine/var/lib/iocaine/qmk-metrics.json"
  persist-interval "1h"
}

http-server default {
  use metrics=metrics
}
EOF

Once we restart iocaine, the metrics will be available immediately at http://127.0.0.1:42042/metrics:

# curl -s http://127.0.0.1:42042/metrics | grep '^iocaine_version'
iocaine_version{version="3.1.0"} 1

The metrics mentioned above will appear in this listing as soon as iocaine has seen some traffic. Lets give it some, and check the metrics!

# curl -s http://127.0.0.1:42069/ >/dev/null
# curl -s http://127.0.0.1:42069/ -A ClaudeBot >/dev/null
# curl -s http://127.0.0.1:42069/ -A Perplexity >/dev/null
# curl -s http://127.0.0.1:42069/ -A "Mozilla/5.0 Firefox/0" >/dev/null
# curl -s http://127.0.0.1:42042/metrics | grep '^qmk_'
qmk_garbage_generated{host="127.0.0.1:42069"} 5370
qmk_requests{host="127.0.0.1:42069"} 4
qmk_ruleset_hits{outcome="default",ruleset="default"} 1
qmk_ruleset_hits{outcome="garbage",ruleset="ai.robots.txt"} 1
qmk_ruleset_hits{outcome="garbage",ruleset="major-browsers"} 1
qmk_ruleset_hits{outcome="garbage",ruleset="unwanted-visitors"} 1

Setting up the service

Our iocaine has a decent training corpus now, uses ai.robots.txt’s robot list, and even has metrics enabled - it is time to stop running it by hand, and set up a systemd service. If you’re on a BSD, or a Linux distribution that does not use systemd, you’ll have to figure out how your init system of choice can manage iocaine, this section will not be of much use.

For the rest of us, there’s a good template in the sources we can base ours own. Let me show you a magic trick!

# /opt/iocaine/bin/iocaine show embeds -c /iocaine.service \
    | tee /opt/iocaine/etc/systemd/iocaine.service

We can enable it with systemctl directly:

# systemctl enable /opt/iocaine/etc/systemd/iocaine.service
Created symlink '/etc/systemd/system/iocaine.service' → '/opt/iocaine/etc/systemd/iocaine.service'.
Created symlink '/etc/systemd/system/multi-user.target.wants/iocaine.service' → '/opt/iocaine/etc/systemd/iocaine.service'.

The service file, as extracted, is not suitable for our needs, as it hardcodes different paths than what we’ll be using. Thankfully, enabling the service doesn’t start it, we can adjust it first. Like iocaine, systemd supports partial unit files too, and we can use systemctl edit iocaine.service to easily edit it!

# systemctl edit --stdin iocaine <<EOF 
[Service]
ExecStart=
ExecStart=/opt/iocaine/bin/iocaine --config-path /opt/iocaine/etc/iocaine/config.d start
EOF
Successfully installed edited file '/etc/systemd/system/iocaine.service.d/override.conf'.

We have a slight problem now, though: we initially configured our metrics to persist to a directory under /opt/iocaine - but systemd creates a dynamic user for iocaine, and it would really be in our best interest to save metrics to a file that is easy to write to under these circumstances. Lets change the iocaine configuration!

We have two options here: we can either edit /opt/iocaine/etc/iocaine/config.d/02-metrics.kdl, and change persist-path to /var/lib/iocaine/qmk-metrics.json, or we can place an additional snippet into /opt/iocaine/etc/iocaine/config.d/03-systemd.kdl. For this guide, we’ll do the latter, and drop an override into said file:

# cat >/opt/iocaine/etc/iocaine/config.d/03-systemd.kdl <<EOF
prometheus-server metrics {
  persist-path "qmk-metrics.json"
}
EOF

The path is simply the filename, because the service ensures iocaine is running with /var/lib/iocaine as its working directory. We’re not going to copy over our persisted metrics.

With both iocaine and its systemd service configured, we can start it up:

# systemctl start iocaine

Nothing extraordinary here, this is how you set up any systemd service afterall. Lets see if it works!

# systemctl status iocaine
● iocaine.service - iocaine, the deadliest poison known to AI
     Loaded: loaded (/etc/systemd/system/iocaine.service; enabled; preset: enabled)
    Drop-In: /etc/systemd/system/iocaine.service.d
             └─override.conf
     Active: active (running) since Tue 2025-12-23 11:28:27 UTC; 11s ago
 Invocation: c724e35d37434cd8934951bbdc3e9a56
   Main PID: 1224 (iocaine)
      Tasks: 2 (limit: 4657)
     Memory: 44.6M (peak: 47M)
        CPU: 220ms
     CGroup: /system.slice/iocaine.service
             └─1224 /opt/iocaine/bin/iocaine --config-path /opt/iocaine/etc/iocaine/config.d start

Dec 23 11:28:27 localhost systemd[1]: Starting iocaine.service - iocaine, the deadliest poison known to AI...
Dec 23 11:28:27 localhost iocaine[1224]: 2025-12-23T11:28:27.147914Z  WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
Dec 23 11:28:27 localhost systemd[1]: Started iocaine.service - iocaine, the deadliest poison known to AI.
# curl -is http://127.0.0.1:42069/
HTTP/1.1 421 Misdirected Request
content-length: 0
date: Tue, 23 Dec 2025 11:29:14 GMT

Lovely!

Caddy configuration

We’re done with the hard parts now: iocaine is up and running, managed by systemd. We even have metrics (though we haven’t set up a dashboard for them yet)! It’s time to place it in front of our stuff. For the purpose of this guide, we’ll be using Caddy. The examples shown herein are intentionally limited, as this is a getting started guide, not a Caddy mastery one.

  Important

To see how to use iocaine with other reverse proxies and similar tools, see the reverse proxy guides.

Lets say we have a simple blog, a static one, and we’d prefer if the crawlers didn’t reach it. Lets roll up our sleeves, and etch these magic words into our Caddyfile:

blog.example.com {
  @read method GET HEAD
  reverse_proxy @read 127.0.0.1:42069 {
    @fallback status 421
    handle_response @fallback
  }
  root /var/www/blog.example.com
  file_server
}

This will send GET and HEAD requests to iocaine first, and if iocaine instructs Caddy (through a HTTP 421 Misdirected Request response) to serve real contents, we’ll tell Caddy to fall back to serving static files from /var/www/blog.example.com.

If we’d like to put iocaine in front of something that isn’t a static site, which we have to reverse proxy, that is very similar too. Lets imagine a GoToSocial instance! We want to send GET and HEAD requests through iocaine, but route other methods directly to GoToSocial. Here’s one way to do just that:

sloth.example.com {
  @read method GET HEAD
  reverse_proxy @read 127.0.0.1:42069 {
    @fallback status 421
    handle_response @fallback
  }
  reverse_proxy 127.0.0.1:8080
}

Final remarks

But this is where this guide will end: it’s not a guide to teach you dark Caddy secrets. It’s a guide to get started with Caddy and iocaine, and we’re up and running! Look at us playing with the configuration: isn’t operations fun?

Now go, tweak it further, watch the metrics, and see the Crawlers get trapped in the maze.