Getting started with iocaine

Overview

In this short guide, we will set up iocaine from scratch, including a custom request handler. The request handler shown here is intentionally simple, and limited, but hopefully will serve as a reasonable starting point. Despite its simplicity, it is still powerful, and in this author’s experience, will route the vast majority of unwanted visitors into iocaine’s maze.

This guide will show the same request handler in all the different languages iocaine supports: Roto, Lua, and Fennel. All of them are equivalent in functionality, choose whichever language you prefer.

Requirements

The requirements below are for this guide, not necessarily for iocaine itself. It can be used with reverse proxies other than Caddy, and works on Linux systems without systemd, and even on non-Linux systems like FreeBSD. A lot of this guide will apply to systems that fall outside of what’s documented here, but some parts will have to be adapted.

Setting up iocaine

Before we do anything else, we need to set up iocaine. To do so, we have to download it, set up a systemd unit for it, and create the configuration, including writing a request handler. We’re going to store everything iocaine related in /opt/iocaine. Let’s begin!

This is the directory tree we’ll be using, the commands shown below assume all of these directories exist:

❯ tree /opt/iocaine
/opt/iocaine
├── bin
├── etc
│   └── systemd
├── libexec
│   └── request-handler
└── share
    └── data

Installing iocaine

Assuming you’re installing iocaine on an x86-64 system, you can download pre-built binaries for it:

❯ curl -s https://git.madhouse-project.org/api/packages/iocaine/generic/iocaine-binaries/2.5.1/iocaine-2.5.1.x86_64-linux.zst \
  | unzstd - -o /opt/iocaine/bin/iocaine
❯ chmod +x /opt/iocaine/bin/iocaine

Or, if you’re on aarch64:

❯ curl -s https://git.madhouse-project.org/api/packages/iocaine/generic/iocaine-binaries/2.5.1/iocaine-2.5.1.aarch64-multiplatform.zst \
  | unzstd - -o /opt/iocaine/bin/iocaine
❯ chmod +x /opt/iocaine/bin/iocaine

Next, we’ll need the systemd unit:

❯ curl https://git.madhouse-project.org/iocaine/iocaine/raw/tag/iocaine-2.5.1/data/iocaine.service \
       -o /opt/iocaine/etc/systemd/iocaine.service

This service file needs a few changes to adapt to our location. Edit the file, and change the following lines:

ExecStart=/opt/iocaine/bin/iocaine --config-file /opt/iocaine/etc/config.toml start
ExecReload=/opt/iocaine/bin/iocaine --config-file /opt/iocaine/etc/config.toml reload

We could install the service file now, but we won’t, yet. Lets configure it first!

Configuring iocaine

Before we can run iocaine, we’ll need to source a wordlist and some training data - but we’ll do that in the next section. First, let us set up the configuration in advance, so we’ll know what else we need! Without much further ado, place this configuration into /opt/iocaine/etc/config.toml:

[server]
bind = "127.0.0.1:42069"

[server.request-handler]
path = "/opt/iocaine/libexec/request-handler"
language = "roto"

[server.control]
bind = "/run/iocaine/control.socket"
unix_listen_access = "owner"

[metrics]
enable = true
bind = "127.0.0.1:42042"
persist-path = "/var/lib/iocaine/metrics.json"

[sources]
markov = [
  "/opt/iocaine/share/data/1984.txt",
  "/opt/iocaine/share/data/brave-new-world.txt"
]
words = "/opt/iocaine/share/data/words.txt"

If you look closely, you’ll notice that the control socket is bound to /run/iocaine/control.socket, outside of /opt/iocaine: that’s because systemd will create and manage /run/iocaine for us, so we’re making good use of it. Similarly, the metrics persist-path is also set to a directory managed by systemd - so we don’t have to. The reason we do this is because in this setup, we’re letting systemd manage the user for iocaine too, dynamically, we’re not creating it ourselves. If we were to put these stateful data somewhere under /opt/iocaine, setting up permissions would be complicated. We side-step that entire problem by letting systemd do all that for us.

Download auxiliary data

To be able to serve garbage, iocaine requires a training set and a word list - we’ll download some in a moment! We’re also going to need the robots.json file of the ai.robots.txt project. Lets place all of those in /opt/iocaine/share/data!

❯ curl -L https://archive.org/download/GeorgeOrwells1984/1984_djvu.txt \
       -o /opt/iocaine/share/data/1984.txt
❯ curl -L https://archive.org/download/ost-english-brave_new_world_aldous_huxley/Brave_New_World_Aldous_Huxley_djvu.txt \
       -o /opt/iocaine/share/data/brave-new-world.txt
❯ curl -L https://git.savannah.gnu.org/cgit/miscfiles.git/plain/web2 \
       -o /opt/iocaine/share/data/words.txt

We have a training set and a wordlist now, yay! Lets fetch robots.json too:

❯ curl -L https://github.com/ai-robots-txt/ai.robots.txt/raw/refs/heads/main/robots.json \
       -o /opt/iocaine/share/data/ai.robots.txt-robots.json

You may wish to update this file from time to time. A systemd timer that does so daily, or weekly, and then reloads iocaine would work perfectly for this task.

Writing the request handler

By far the most interesting part - in this author’s absolutely not heavily biased opinion - of setting up iocaine is writing the request handler. In the sections below, you will find the same script implemented in all three languages iocaine supports. The script is intentionally simple and limited, it is meant to be a simple, understandable starting point, not a be-all-end-all solution like Nam-Shub of Enki.

We will be using efficient substring matching, JSON parsing, and metrics - surprisingly lot for such a short script, but we’re gonna make good use of all of these core features.

The script is essentially three simple checks:

If the request is found to be in ai.robots.txt, or on our list, or if it fails at pretense, it will be served garbage, otherwise it will be let through. We’ll also utilize metrics to keep tabs on how effective each rule is.

How do we check whether a user agent is a real browser or a pretender? Surprisingly, most of the bots fail at pretense, and they only change the user-agent header. Unfortunately for them, all major browsers will also send a sec-fetch-mode header too. So this check is simple: is the user-agent a major browser? If so, did it send a sec-fetch-mode header? If it pretends to be a major browser, but fails to send the header, we’ll serve it garbage. Surprisingly simple, yet, highly effective.

Roto

The default language is Roto, the configuration above set it as the language, too, so we’ll write that first! Place the following into /opt/iocaine/libexec/request-handler/pkg.roto:

function init() -> Verdict[Unit, String] {
  let robot_list = Json
    .load_file("/opt/iocaine/share/data/ai.robots.txt-robots.json")
    .get_keys();
  iocaine_patterns.insert_patterns("ai.robots.txt", robot_list);

  let major_browser_patterns = MutableStringList.new();
  major_browser_patterns.push("Chrome/");
  major_browser_patterns.push("Firefox");
  iocaine_patterns.insert_patterns("major-browsers", major_browser_patterns);

  let unwanted_visitors = MutableStringList.new();
  unwanted_visitors.push("Perplexity");
  iocaine_patterns.insert_patterns("unwanted-visitors", unwanted_visitors);

  accept
}

function decide(request: Request) -> Verdict[Outcome, Outcome] {
  let robot_patterns = iocaine_patterns.get("ai.robots.txt");
  if robot_patterns.is_match(request.header("user-agent")) {
    metrics.inc("rule::ai.robots.txt");
    accept Outcome.garbage()
  }
  let major_browser_patterns = iocaine_patterns.get("major-browsers");
  if major_browser_patterns.is_match(request.header("user-agent"))
    && (request.header("sec-fetch-mode") == "") {
    metrics.inc("rule::major-browser::sec-fetch-mode");
    accept Outcome.garbage()
  }
  let unwanted_visitor_patterns = iocaine_patterns.get("unwanted-visitors");
  if unwanted_visitor_patterns.is_match(request.header("user-agent")) {
    metrics.inc("rule::unwanted-visitor");
    accept Outcome.garbage()
  }
  metrics.inc("rule::default");
  reject Outcome.not_for_us();
}

test bad_robot {
  let request = RequestBuilder.new("GET", "/hello-world")
    .header("host", "localhost")
    .user_agent("Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)")
    .build();
  match decide(request) {
    Accept(_) -> accept,
    Reject(_) -> reject,
  }
}

test good_user_agent {
  let request = RequestBuilder.new("GET", "/hello-world")
    .header("host", "localhost")
    .header("sec-fetch-mode", "document")
    .user_agent("Mozilla/5.0 (X11; Linux x86_64; rv:139.0) Gecko/20100101 Firefox/139.0")
    .build();
  match decide(request) {
    Accept(_) -> reject,
    Reject(_) -> accept,
  }
}

test faked_major_browser {
  let request = RequestBuilder.new("GET", "/hello-world")
    .header("host", "localhost")
    .user_agent("Mozilla/5.0 (X11; Linux x86_64; rv:139.0) Gecko/20100101 Firefox/139.0")
    .build();
  match decide(request) {
    Accept(_) -> accept,
    Reject(_) -> reject,
  }
}

Even tests are included! Because we can do that, and tests are incredibly useful when we’re building complex things. This isn’t complex yet, but it will be, if we grow it further.

Lua

An alternative language iocaine supports is Lua. It is less performant than Roto, but it is a much better known language, with less limitations than Roto. Lets write the handler in Lua too! Place this into /opt/iocaine/libexec/request-handler/main.lua:

function load_robots_json_patterns(path)
  local data = iocaine.json.load(path)
  local keys = {}
  for k, _ in pairs(data) do
    table.insert(keys, k)
  end

  return iocaine.Patterns(table.unpack(keys))
end

local robot_patterns =
  load_robots_json_patterns("/opt/iocaine/share/data/ai.robots.txt-robots.json")
local major_browser_patterns =
  iocaine.Patterns("Chrome/", "Firefox");
local unwanted_visitor_patterns =
  iocaine.Patterns("Perplexity");

function decide(request)
  if robot_patterns:contains(request:header("user-agent")) then
    if iocaine.metrics then iocaine.metrics:inc("rule::ai.robots.txt") end
    return iocaine.outcome.Garbage
  end
  if major_browser_patterns:contains(request:header("user-agent")) and
     (request:header("sec-fetch-mode") == nil) then
     if iocaine.metrics then iocaine.metrics:inc("rule::major-browser::sec-fetch-mode") end
     return iocaine.outcome.Garbage
  end
  if unwanted_visitor_patterns:contains(request:header("user-agent")) then
     if iocaine.metrics then iocaine.metrics:inc("rule::unwanted-visitor") end
     return iocaine.outcome.Garbage
  end
  if iocaine.metrics then iocaine.metrics:inc("rule::default") end
  return iocaine.outcome.NotForUs
end

function run_tests()
  local bad_bot_request = iocaine.Request("GET", "/hello-world")
  bad_bot_request:set_headers_from({
    ["host"] = "localhost",
    ["user-agent"] = "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)",
  })
  io.write("bad bot test... ")
  assert(decide(bad_bot_request) == iocaine.outcome.Garbage)
  print("ok")

  local good_request = iocaine.Request("GET", "/hello-world")
  good_request:set_headers_from({
    ["host"] = "localhost",
    ["sec-fetch-mode"] = "document",
    ["user-agent"] = "Mozilla/5.0 (X11; Linux x86_64; rv:139.0) Gecko/20100101 Firefox/139.0"
  })
  io.write("good request test... ")
  assert(decide(good_request) == iocaine.outcome.NotForUs)
  print("ok")

  local faked_browser_request = iocaine.Request("GET", "/hello-world")
  faked_browser_request:set_headers_from({
    ["host"] = "localhost",
    ["user-agent"] = "Mozilla/5.0 (X11; Linux x86_64; rv:139.0) Gecko/20100101 Firefox/139.0",
  })
  io.write("faked browser test... ")
  assert(decide(faked_browser_request) == iocaine.outcome.Garbage)
  print("ok")

  return true
end

This ended up being very similar in structure to the Roto request handler. Apart from the difference in syntax, the two major differences is that Lua does not require an init() method, it will happily execute the body of the script on startup. We can also reference variables set up during startup from our decide() function, there is no need to push these into a special variable like in Roto.

If we wish to use the Lua version of the request handler, we need to adjust the configuration a little, and change the language setting of the [server.request-handler] section:

[server.request-handler]
path = "/opt/iocaine/libexec/request-handler"
language = "lua"

Fennel

The entire reason Lua was added as an alternative language is that this author could write requests handlers in Fennel, because he has a soft spot for Lisp languages, and for Fennel in particular. For this reason, the Fennel version of the request handler will not be a direct translation of the Lua version, but will follow a more idiomatic structure.

Place the following code into /opt/iocaine/libexec/request-handler/main.fnl:

(fn load-robots-json-patterns [path]
  (let [data (iocaine.json.load path)]
    (-> (icollect [k _ (pairs data)] k)
        table.unpack
        iocaine.Patterns)))

(local robot-patterns
       (load-robots-json-patterns "/opt/iocaine/share/data/ai.robots.txt-robots.json"))
(local major-browser-patterns
       (iocaine.Patterns "Chrome/" "Firefox"))
(local unwanted-visitor-patterns (iocaine.Patterns "Perplexity"))

(fn is-in-ai-robots-txt? [request]
  (when (robot-patterns:contains (request:header "user-agent"))
    (when iocaine.metrics (iocaine.metrics:inc "rule::ai.robots.txt"))
    iocaine.outcome.Garbage))

(fn is-faked-browser? [request]
  (when (and (major-browser-patterns:contains (request:header "user-agent"))
             (= (request:header "sec-fetch-mode") nil))
    (when iocaine.metrics (iocaine.metrics:inc "rule::major-browser::sec-fetch-mode"))
    iocaine.outcome.Garbage))

(fn is-unwanted-visitor? [request]
  (when (unwanted-visitor-patterns:contains (request:header "user-agent"))
    (when iocaine.metrics (iocaine.metrics:inc "rule::unwanted-visitor"))
    iocaine.outcome.Garbage))

(fn default [request]
  (when iocaine.metrics (iocaine.metrics:inc "rule::default"))
  iocaine.outcome.NotForUs)

(local ruleset [is-in-ai-robots-txt?
                is-faked-browser?
                is-unwanted-visitor?
                default])

(fn decide [request]
  (accumulate [outcome nil
               _ f (ipairs ruleset)
               &until (not= outcome nil)]
    (f request)))

(fn run_tests []
  (fn make-request [user-agent headers]
    (let [request (doto (iocaine.Request "GET" "/hello-world")
                    (: :set_headers_from {:host "localhost"
                                          :user-agent user-agent}))]
      (when headers (each [name value (pairs headers)]
                      (request:set_header name value)))
      request))
  (fn run-test [name request expected-outcome]
    (io.write (.. name " test... "))
    (-> (decide request)
        (= expected-outcome)
        assert)
    (print "ok"))

  (run-test
   "bad bot"
   (make-request "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)")
   iocaine.outcome.Garbage)
  (run-test
   "faked browser"
   (make-request "Mozilla/5.0 (X11; Linux x86_64; rv:139.0) Gecko/20100101 Firefox/139.0")
   iocaine.outcome.Garbage)
  (run-test
   "good browser"
   (make-request "Mozilla/5.0 (X11; Linux x86_64; rv:139.0) Gecko/20100101 Firefox/139.0"
                 {:sec-fetch-mode "document"})
   iocaine.outcome.NotForUs)

  true)

{:decide decide
 :run_tests run_tests}

I love the warm embrace of parens in the morning. If we wish to use the Fennel version of the request handler, we need to adjust the configuration a little, and change the language setting of the [server.request-handler] section, and also tell iocaine where to find the Fennel compiler:

[server.request-handler]
path = "/opt/iocaine/libexec/request-handler"
language = "fennel"
options = { compiler = "/opt/iocaine/libexec/fennel.lua" }

We will, of course, have to put the compiler there:

❯ curl -L https://fennel-lang.org/downloads/fennel-1.5.3.lua \
       -o /opt/iocaine/libexec/fennel.lua

Nice. We can use Fennel now! An alternative option would have been to compile Fennel to Lua ourselves, and use it through the Lua engine, but… why do something ourselves when iocaine can do it for us?

Setting up the iocaine service

Now that we have iocaine configured, to use it, we’ll have to start it. As explained in the beginning, we’ll be using systemd for that. If you’re on a distribution that does not use it, or you’re on a BSD, you will have to adapt this section to the init system you’re using.

For systemd, we simply copy (or move, your call!) the edited service file to /etc/systemd/system:

❯ sudo cp /opt/iocaine/etc/systemd/iocaine.service /etc/systemd/system/
❯ sudo systemctl daemon-reload
❯ sudo systemctl enable iocaine
❯ sudo systemctl start iocaine

Nothing extraordinary here, this is how you set up any systemd service afterall.

Testing the setup so far

We should have everything ready on the iocaine side of things, time to test the setup! Because every variant of the request handler includes tests, we can run iocaine test to run those tests:

❯ /opt/iocaine/bin/iocaine --config-file /opt/iocaine/etc/config.toml test
Test 1 / 3: pkg.faked_major_browser... ok
Test 2 / 3: pkg.bad_robot... ok
Test 3 / 3: pkg.good_user_agent... ok
Ran 3 tests, 3 succeeded, 0 failed

Running the tests for the Fennel and Lua versions will result in output similar to the Roto one above. We can also test it by hand with curl:

❯ curl -i http://127.0.0.1:42069/hello-world
HTTP/1.1 421 Misdirected Request
content-type: text/plain; charset=utf-8
content-length: 0
date: Sat, 12 Jul 2025 11:30:04 GMT

Caddy configuration

We’re done with the hard parts now: iocaine is up and running, with a request handler, written in the programming language of our choosing. We even enabled metrics! But we’re not going to set up anything more on that front - see the monitoring howto for that! What we will do, however, is front it with Caddy. In this guide, we’re not going to use any fancy features of Caddy, see the fronting with Caddy HOWTO for more information about integrating iocaine with Caddy.

Say, we have a GoToSocial server? We can make a lot of the bots go away with this simple trick:

sloth.example.com {
  @read method GET HEAD
  @not-read not {
    method GET HEAD
  }
  reverse_proxy @read 127.0.0.1:42069 {
    @fallback status 421
    handle_response @fallback {
      reverse_proxy 127.0.0.1:8080
    }
  }
  handle @not-read {
    reverse_proxy 127.0.0.1:8080
  }
}

This sends GET and HEAD requests through iocaine first, but reverse proxies any other method to GoToSocial directly. For static sites, where methods other than these two don’t make much sense, we can write something along these lines instead:

blog.example.com {
  @read method GET HEAD
  @not-read not {
    method GET HEAD
  }
  reverse_proxy @read 127.0.0.1:42069 {
    @fallback status 421
    handle_response @fallback {
      root /var/www/blog.example.com
      file_server
    }
  }
  handle @not-read {
    respond 405
  }
}

Final remarks

But this is where this guide will end: it’s not a guide to teach you dark Caddy secrets. It’s a guide to get started with Caddy and iocaine, and we’re up and running! Look at us playing with the configuration: isn’t operations fun?

Now go, tweak it further, watch the metrics, and see the Crawlers get trapped in the maze. And if you’re looking for a challenge, and a complex configuration to get inspiration from (or perhaps to use it!), you’re welcome to have a peek at Nam-Shub of Enki.