Overview
While iocaine is advertised as an aggressive defense system, in reality, it’s a shell around a scripting environment. It’s the script iocaine runs in this environment that does most of the work, while iocaine provides the runtime and the tools to achieve this. It comes with a built-in request handler script, one that’s fairly efficient, and can catch a lot of things, but it has to strike a fine balance between being effective and easy to modify. There’s also Nam-Shub of Enki, the author’s personal script - with a lot more features than the built-in, but at the cost of being much, much more complex.
Nevertheless, the request handler script is the heart of iocaine. In this reference guide, we will explore the entire environment: when and how it gains control, what kind of tools it provides, with examples.
Scripts can be written in three different languages: Roto, Lua, and Fennel. As of this writing, iocaine supports Roto 0.9, Lua 5.4, and comes with a Fennel 1.6.0 compiler. As far as the runtime is concerned, these have feature parity, everything iocaine specific is available for all of them. Perhaps in different form, that makes more sense for the language, but they’re all there. When we’re showing examples, we will show examples for all three languages.
Which language you choose, is largely up to you. Lua and Fennel perform the same, and while Roto is considerably faster than both, in many cases, the request handler will spend most of its time generating garbage - and that is Rust code, shared between all three runtimes. The language you choose for the script handler will not be your bottleneck - choose one you’re familiar with, or one you like, or want to play with. Playing with your tools is valid, and encouraged. It’s fun! Especially when you get to annoy the heck out of the crawlers.
But I digress. Lets put up a facade of professionalism, and let us move on to a brief description of how data flows through iocaine!
Execution flow
The script running within iocaine is long running: for every server (http or haproxy-spoa), iocaine starts a new instance of the script. Any incoming request will be sent to the appropriate request handler. If sent to a http-server, iocaine will first call the handler’s decide function, which has to decide the fate of the request. The output of this function will be used to call the output function, to generate a response. When using a haproxy-spoa-server, only decide is called, and its response is sent back to HAProxy.
The idea here is that when starting up, the script will set up a number of pattern matchers, will train any garbage generators, so that the functions called by iocaine will not need to perform expensive work, and can make decisions fast, and generate garbage efficiently, if need be.
The different instances of the handler scripts are isolated, and cannot influence each other. But each instance shares a common runtime, one that is not sandboxed, request handlers can change global state, but doing so outside of the initialization phase is highly discouraged.
Configuration
Configuration for the request handler can be declared in iocaine’s configuration file. This is not interpreted by iocaine itself, but is serialized to a format the language of the request handler understands. In case of Lua and Fennel, the configuration is transformed into a Lua table, and will be available as iocaine.config. For Roto, it’s transformed to a HashMap. See the next section for more information what’s available in the standard environment!
The standard runtime environment
For Lua and Fennel, an iocaine table is provided in the runtime environment, and is available to the script in every phase. This table is home to all of the functions provided by iocaine, along with the configuration (in iocaine.config), the path to the script’s path (in iocaine.script_path; this is the same value as set in declare-handler’s path property),
Roto does not have a big standard library, the library iocaine provides to the Roto environment is not collected under a single module - you’ll see below, soon. Unlike Lua and Fennel, Roto does not have globals by default. As such, in addition to the configuration (available as config), the script path (script_path), the shared context also includes globals and rng. We’ll talk about the latter elsewhere.
Important
Added in iocaine 3.1.0.
Added in iocaine 3.1.0, the iocaine table for Lua and Fennel has an instance_id key. For Roto, the same thing is available simply as instance_id. The value of it is a base64 encoded UUID derived from the global instance-id and the name of the server the script was instantiated for.
Roto-specific runtime library
Roto is a much more limited language than Lua or Fennel. It doesn’t have lists, nor tables, for example. As such, handling such things as the configuration, or list of strings, or the deserialization of various file formats would be impossible if iocaine didn’t provide something. It doesn’t quite provide the kind of convenience and power Lua tables do, but it provides enough to get things done in a sensible way.
The following types are made available in the runtime library for Roto:
Value
A Value is iocaine’s implementation of a dynamic value. It can hold a boolean, a signed 64-bit integers or a 64-bit float, a string, a Vector, or a HashMap.
There is no new function: to construct a Value, we need to call .into_value() on the source type. For example:
let b = true;
let b_value = b.into_value();
let answer = 42.0;
let answer_value = answer.into_value();
To turn a Value back to its original type, we can call an “as_” method on the Value. The name of the method depends on the type we want to extract: as_bool for booleans; as_int or as_float for integers and floats, respectively; as_str for strings, as_vector for Vectors; and as_map for HashMaps. All of these methods return an Option[T], where T is the desired type. If the contained Value is not of the expected type, the method returns None - there are no conversions made.
Similar to the “as_” family of methods, there’s an “is_” family too, with similar naming. These return a bool, and can be used to check whether a Value is of a particular type.
To showcase how to use these:
let b = true;
let b_value = b.into_value();
# This will short-circuit, and return from the function
# if `b.as_bool()` returns `None`.
if b.as_bool()? {
# do something
}
# This can handle `None` gracefully.
match b.as_int() {
None -> {
# do something...
},
Some(_) -> (),
}
# Or you can do something like this:
let r = match b.as_int() {
None -> 42,
Some(i) -> i,
};
Values can also be serialized into various formats:
let example = HashMap.new();
example.insert_str("hello", "world");
example.insert_bool("example", true);
example.insert_int("answer", 42);
# json:
# {"hello": "world", "example": true, "answer": 42}
let json = example.into_value().to_json()?;
# yaml:
# hello: "world"
# example: true
# answer: 42
let yaml = example.into_value().to_yaml()?;
# toml:
# hello = "world"
# example = true
# answer = 42
let toml = example.into_value().to_toml()?;
Vector
Vectors are iocaine’s approximation of a dynamically typed list. A Vector can contain any number of elements, of any type Value supports. The available methods are very limited, though, Vectors are intended to be write-only, and to be consumed on the Rust side.
Available methods are:
Vector.new()to create a new, emptyVector.Vector.push(value)to push a newValueinto the vector.Vector.as_string_list()to try and convert the Vector to aStringList. ReturnsNoneif the vector could not be converted.Vector.len()returns the length of the list. (Added in iocaine 3.2.0)Vector.nth(index)returns the element atindex, if any. (Added in iocaine 3.2.0)
HashMap
HashMaps are iocaine’s simulation of a dynamic table, they can store key-value pairs, where the key is always a string, and the value is a dynamic [Value]. The type also provides convenient ways to traverse a nested HashMap.
The following methods are available for the HashMap type:
HashMap.new()creates a new, emptyHashMap, and returns it.map.insert(key, value)inserts a newValueunder the key named bykeyinto the Hashmap. Returns theHashMapitself, and can be chained.map.has(key)returns whether theHashMapcontains the namedkey.map.get(key)returns theValueatkey, orNoneif the key is not available.map.has_path(path): returns whether a nestedpathexists within theHashMap.map.get_path(path): returns theValueat the nestedpath, orNoneif no such path is found.map.get_or(key, fallback): returns theValueatkey, orfallback(aValuetype) if the key is not found.map.get_path_or(path, fallback): returns theValueat the nestedpath, orfallback(aValuetype) if the path is not found.map.keys(): returns aStringListof the keys available in the map.
The path-family of functions traverse a dot-notated path: they split the string by dots, and try to look up each key. All components but the last must be a HashMap. This is best shown via an example:
let map = HashMap.new();
let foo_map = HashMap.new();
let bar_map = HashMap.new();
bar_map.insert_str("baz", "yes");
foo_map.insert_map("bar", bar_map);
map.insert_map("foo", foo_map);
# `map` at this time would serialize to the following JSON:
# {"foo": {"bar": {"baz": "yes"}}}
if map.get_path("foo.bar.baz")? == "yes" {
accept
}
reject
On top of the functions listed above, various get and insert helpers are also available, for each Value. Just like the Value type provides the as_<type> family of methods, HashMap has insert_<type>, get_as_<type>, get_path_as_<type>, get_as_<type>_or, and get_path_as_<type>_or.
let map = HashMap.new();
map.insert_bool("bool", true);
map.insert_str("str", "string!");
map.insert_int("int", 42);
let example1 = map.get_as_bool("bool")?;
let example2 = map.get_as_str("str")?;
let example3 = map.get_as_int("int")?;
StringList
Because Roto does not have lists yet, and there are functions and methods provided by the environment that require a list of strings, specifically (where Vector is inadequate, because it doesn’t guarantee the members will be strings), a StringList type is available in the runtime library.
It provides the following methods:
StringList.new()to create a new, empty string list.StringList.push(str)to push a new string to the end of the list.StringList.join(separator)to join the list into a string, separated byseparator.StringList.concat()to concatenate the list into a string, without a separator.StringList.contains(key)returns whether the list contains a particular string.StringList.is_empty()returns whether the list is empty.StringList.len()returns the length of the list. (Added in iocaine 3.2.0)StringList.nth(index)returns the element atindex, if any. (Added in iocaine 3.2.0)
Additionally, strings have a split_by(delimiter) method, so they can be split into a StringList:
let list = "foo,bar,baz".split_by(",");
let example = list.concat();
if example == "foobarbaz" {
accept
}
reject
globals
Unlike Lua and Fennel, Roto does not have global variables. Yet, to work efficiently, an iocaine handler script must be able to pre-compute a number of things: pattern matchers, various kinds of garbage, perhaps even configuration. The way this is achieved is by splitting the handler into two phases: an init phase and a main phase. While the globals table is available in both phases, it is strongly recommended to only use it in the init phase, because retrieving values out of the table is costly.
If it is costly, how else are we supposed to access globals in the main phase? Great question! At the end of the init phase, once the init function returned, every entry in globals will be unrolled into a constant. Let me demonstrate this with an example!
# At init time:
globals.add("EXAMPLE_BOOL", true.into_global());
# At main time:
if EXAMPLE_BOOL {
# Do something...
}
The globals table has two methods:
globals.add(key, value), wherekeymust be a string, andvalueis any type that has an.into_global()method.globals.get(key)returns theGlobal-wrapped value associated with key, if any, orNone. This value needs to be unwrapped before being useful - see below.
Any key that is a valid Roto identifier will be made available as a constant in the main phase.
Most things you may wish to store globally have an .into_global() method. A list of such things will be added to this reference manual at a later point.
Initialization
Initialization is performed only once, at startup, once for each instance of every declared handler.
Lua and Fennel
For Lua and Fennel, the request handler is loaded from the declared path by performing require("main"). It is expected to return a table with three keys: decide, output, and run_tests. Only output is required, the other two are optional. If there’s no decide key in the table, iocaine will skip decision making, and will always call output with an empty decision string. Servers that require a decide function will not work if the function is missing.
Because Lua runs the body of a module when require-ing it, the main module can set up global tables to hold the trained garbage generators, pattern matchers, and whatever else needs to be made available for the decide and output functions.
In case of Lua and Fennel, all phases run in the same interpreter, there is no isolation between them.
Roto
For Roto, things are a little bit more complicated, because Roto does not have built-in globals. We simulate them. There is no script body to run outside of functions, either, so Roto provides an init function:
fn init(metrics: Metrics) -> ()? {
Some(())
}
While Lua and Fennel include Metrics in the global iocaine table, there are no globals in Roto, so it is passed as an argument to the init() function. See the metrics documentation for more information why this is only passed to init, and not to any other function.
Unlike Lua and Fennel, iocaine runs the Roto init function in a separate runtime environment than the main functions. It does so, because of how globals are simulated, and due to that, the scripts have to run in separate environments, simply because the unpacked constants would not be available t compile time. This means that Roto handlers will require two directories under the script path: init and main, for the two phases, respectively.
decide and output functions
The decide(request) and output(request, decision) functions are the heart of a request handler. iocaine calls these whenever it needs to decide the fate of an incoming request, or needs to generate output. That output is not necessarily garbage, though: it includes responding to the reverse proxy to direct it to serve the real contents, too.
decide
The decide(request) function receives a Request, and is expected to produce a String (or a language-specific error). Its output will be sent to the output function, if output generation is needed.
If an error is returned, iocaine will respond with an Internal Server Error to the reverse proxy.
Examples
Lua
function decide(request)
return "default"
end
On error, the function should return nil.
Fennel
(fn decide [request] "" ;; <- this is a docstring
"default"
)
On error, the function should return nil.
Roto
fn decide(request: Request) -> String? {
Some("default")
}
On error, the function should return None.
output
The output(request, decision) function is responsible for generating output. It receives the same request decision does, and the output of decision() too, if it is available. The function is expected to return a Response, or a language-specific error.
If an error is returned, iocaine will respond with an Internal Server Error to the reverse proxy.
Examples
Lua
function output(request, decision)
local decision = decision or "default"
local response = iocaine.Response()
response.status = 421
return response
end
On error, the function should return nil.
Fennel
(fn output [request decision]
(local decision (or decision "default"))
(local response (iocaine.Response))
(set response.status 421)
response)
On error, the function should return nil.
Roto
fn output(request: Request, maybe_decision: String?) -> Response? {
let decision = match maybe_decision {
Some(v) -> v,
None -> "default",
};
let response = ResponseBuilder.new();
response.status_code(421)
Some(response.build())
}
On error, the function should return None.
Requests
The Request type represents an incoming HTTP request. We can check the HTTP method, the path, any headers, query parameters, or cookies, and iocaine also provides ways to serialize headers, query parameters and cookies into a table (or map, depending on the language).
Requests are rarely constructed within the handler script - only for testing purposes. They’re made available as the parameter of the decide function.
The request’s path
Lua
local path = request.path
Fennel
(local path request.path)
Roto
let path = request.path();
The request method
Lua
local method = request.method
Fennel
(local method request.method)
Roto
let method = request.method();
Request headers
Request headers can be accessed via the The function returns The function returns The method returns an empty string of the header is not found.header method of a Request object. See below for language-specific syntax:Lua
local user_agent = request:header("user-agent")
nil if the header is not found.Fennel
(local user-agent (request:header "user-agent"))
nil if the header is not found.Roto
let user_agent = request.header("user-agent");
To serialize every header into a table (or map) - for logging purposes for example -, the scripting environment provides a way. See below for the language-specific way to achieve this! Returns a Lua table of the headers. Returns a Lua table of the headers. Serializes the headers into a Lua
local headers = request:headers()
Fennel
(local headers (request:headers))
Roto
let headers = HashMap.new();
request.headers_into_map(headers);
HashMap.
Query parameters
A request’s query parameters can be accessed via the The function returns The function returns The method returns an empty string of the query parameter is not found.query method of a Request object. See below for language-specific syntax:Lua
local utm_source = request:query("utm_source")
nil if the query parameter is not found.Fennel
(local utm-source (request:query "utm_source"))
nil if the query parameter is not found.Roto
let utm_source = request.query("utm_source");
To serialize every query parameter into a table (or map) - for logging purposes for example -, the scripting environment provides a way. See below for the language-specific way to achieve this! Returns a Lua table of the query parameters. Returns a Lua table of the query parameters. Serializes the query parameters into a Lua
local queries = request:queries()
Fennel
(local queries (request:queries))
Roto
let queries = HashMap.new();
request.queries_into_map(queries);
HashMap.
Cookies
Cookies sent with a request can be accessed via the The function returns The function returns The method returns an empty string of the cookie is not found.cookie method of a Request object. See below for language-specific syntax:Lua
local session_cookie = request:cookie("session_id")
nil if the cookie is not found.Fennel
(local session-cookie (request:cookie "session_id"))
nil if the cookie is not found.Roto
let session_cookie = request.cookie("session_id");
To serialize every cookie into a table (or map) - for logging purposes for example -, the scripting environment provides a way. See below for the language-specific way to achieve this! Returns a Lua table of the cookies. Returns a Lua table of the cookies. Serializes the cookies into a Lua
local cookies = request:cookies()
Fennel
(local cookies (request:cookies))
Roto
let cookies = HashMap.new();
request.cookies_into_map(cookies);
HashMap.
Constructing a Request for testing
Important
This section is under construction. Please check back later.
Pattern matching
Matchers are an important, core feature of the iocaine environment. They allow you to match a string against various kinds of patterns in an efficient way, providing a near-uniform method. Each of them can - and should be - constructed at initialization time, so that they can be used for matching at decision (or maybe even output!) time.
For Lua and Fennel, all matcher constructors are available under the iocaine.matcher table. For Roto, the constructors are methods on the Matcher type.
Substrings
Perhaps the simplest thing to match against are a set of strings. No globs, no regular expressions, just strings. You can create a pattern matcher, and see if a string contains any of the sub-strings in the matcher, in an efficient way, using SIMD instructions, potentially matching against substrings in parallel. Depending on the number of substrings, this can be more efficient than using the language’s native substring matching.
For Lua and Fennel, the constructor is iocaine.matcher.Patterns, and requires a list of strings. For Roto, the constructor is Matcher.from_patterns, and requires a StringList.
Examples
Lua
-- At init time:
local example = iocaine.matcher.Patterns("ClaudeBot", "Perplexity", "GPTBot")
-- At decision time:
if example:matches(request:header("user-agent")) then
return "garbage"
end
Fennel
;; At init time:
(local example (iocaine.matcher.Patterns "ClaudeBot" "Perplexity" "GPTBot"))
;; At decision time:
(when example:matches (request:header "user-agent")
"garbage")
Roto
# At init time:
let patterns = StringList.new();
patterns.push("ClaudeBot");
patterns.push("Perplexity");
patterns.push("GPTBot");
globals.add("EXAMPLE", Matcher.from_patterns(patterns)?);
# At decision time:
if EXAMPLE.matches(request.header("user-agent")) {
return Some("garbage");
}
Regular expressions
Sometimes substring matching is not enough, because the strings we match against may have variations, non-constant parts. In these times, regular expressions are a great tool to have. The scripting environment provides not one, but two ways to work with them: you can match against - and extract capture groups from - a single regex, or you can match against a set of regexes, but without the ability to extract capture groups.
The latter can be of use when you wish to group expressions together, when any of them matching would satisfy the condition.
For Lua and Fennel, these matchers can be constructed with iocaine.matcher.Regex and iocaine.matcher.RegexSet. The former takes a single string argument that is compiled to a regexp, the latter requires a list of such strings. For Roto, the constructors are Matcher.from_regex() and Matcher.from_regex_set, respectively. The former takes a string, the latter a StringList.
To extract a capture group, a capture method is provided. Only named capture groups can be extracted. See the language examples below!
Examples
Lua
-- At init time:
local example = iocaine.matcher.Regex("Mozilla/(?<version>\\d+\\.\\d+) ")
local example_set = iocaine.matcher.RegexSet("(?i:.*Bot)", "(?i:Google*)")
-- At decision time
if example:matches(request:header("user-agent")) then
local captured = example:capture(request:header("user-agent"), "version")
if captured ~= "5.0" then
return "garbage"
end
end
if example_set:matches(request:header("user-agent")) then
return "garbage"
end
Fennel
;; At init time:
(local example (iocaine.matcher.Regex "Mozilla/(?<version>\\d+\\.\\d+) "))
(local example_set (iocaine.matcher.RegexSet "(?i:.*Bot)" "(?i:Google*)"))
;; At decision time
(when (example:matches (request:header "user-agent"))
(let [captured (example:capture (request:header "user-agent"))]
(when (not= captured "5.0")
"garbage")))
(when (example_set:matches (request:header "user-agent"))
"garbage")
Roto
To extract a capture group, Roto requires the matcher to be “cast” into the concrete matcher type with the as_regex_matcher() method, see the example below.
# At init time:
globals.add("EXAMPLE", Matcher.from_regex("Mozilla/(?<version>\\d+\\.\\d+) ")?);
let regex_set = StringList.new();
regex_set.push("(?i:.*Bot)");
regex_set.push("(?i:Google*)");
globals.add("EXAMPLE_SET", Matcher.from_regex_set(regex_set)?);
# At decision time:
if EXAMPLE.matches(request.header("user-agent")) {
let captured = EXAMPLE.as_regex_matcher()?
.capture(request.header("user-agent"), "version");
if not captured == "5.0" {
return Some("garbage");
}
}
if EXAMPLE_SET.matches(request.header("user-agent")) {
return Some("garbage");
}
IP Prefixes
Sometimes bots are not identifiable by their user agent, nor do they send any unique headers - but they may originate from specific networks, and sometimes that network is known. Usually disclosed to recommend allow-listing those networks on one’s firewall, but we can use this information for the exact opposite purpose.
Usually, a reverse proxy will forward the requesting agent’s IP address through an x-forwarded-for header. We do not wish to do string-based matching against that, unless the network is really tiny. A sizable network would make even parallel string matching inefficient. Instead, we can use the IP Prefix matcher instead, which can match networks far more efficiently, because it compiles the strings to binary first, among other things.
Lua and Fennel can construct this matcher with iocaine.matcher.IPPrefixes, which takes a list of strings. Roto can construct it via Matcher.from_ip_prefixes, which takes a StringList.
Examples
Lua
-- At init time:
local example = iocaine.matcher.IPPrefixes("128.93.166.0/26")
-- At decision time:
if example:matches(request:header("x-forwarded-for")) then
return "garbage"
end
Fennel
;; At init time:
(local example (iocaine.matcher.IPPrefixes "128.93.166.0/26"))
;; At decision time:
(when (example:matches (request:header "x-forwarded-for"))
"garbage")
Roto
# At init time:
let prefixes = StringList.new().push("128.93.166.0/26");
globals.add("EXAMPLE", Matcher.from_ip_prefixes(prefixes)?);
# At decision time
if EXAMPLE.matches(request.header("x-forwarded-for")) {
return Some("garbage")
}
ASNs & GeoIP
When it is not possible to identify a Crawler from headers alone, identifying it by the ASN it came from might be an option. If, for whatever reason, we’d need to block an entire country, GeoIP can help. The common theme between this two is that they both require a database from Maxmind.
There are two ways to use these matchers: we either use the matches method common to all matchers, or we can call the lookup or within methods. The lookup method will return the ASN or Country (the 2-letter ISO code) for a given item, while within will return a boolean indicating whether an item is within a particular ASN or Country.
The lookup method is most useful for logging and metrics purposes, while within lets us compare against a single ASN or Country in any phase, without having to prepare a list at init time.
Examples
Lua
An ASN matcher can be constructed via iocaine.matcher.ASN(db_path[, ASNs...]), where db_path is a path to a Maxmind ASN database in mmdb format, and ASNs are - optional - integers, the ASN numbers, without the AS prefix. The ASNs are required for using the matches method, without those, only the lookup and within methods are available.
A Country matcher can be constructed via iocaine.matcher.Country(db_path[, Countries...]), where db_path is a path to a Maxmind Country database in mmdb format, and Countries are - optional - two-letter ISO country codes. The Contries are required for using the matches method, without those, only the lookup and within methods are available.
Because the constructor returns a generic Matcher, to use the ASN and Country-specific methods, we’ll need to unwrap them first.
-- At init time
local ASN = iocaine.matcher.ASN("GeoLite2-ASN.mmdb", 55990):as_asn_matcher()
local COUNTRY = iocaine.matcher.Country("GeoLite2-Country.mmdb", "HU"):as_country_matcher()
-- At decision time
local asn = ASN:lookup(request:header("x-forwarded-for"))
if ASN:matches(request:header("x-forwarded-for")) do
return "garbage"
end
local country = COUNTRY:lookup(request:header("x-forwarded-for"))
if COUNTRY:matches(request:header("x-forwarded-for")) do
return "garbage"
end
Fennel
An ASN matcher can be constructed via (iocaine.matcher.ASN db_path [ASNs...]), where db_path is a path to a Maxmind ASN database in mmdb format, and ASNs are - optional - integers, the ASN numbers, without the AS prefix. The ASNs are required for using the matches method, without those, only the lookup and within methods are available.
A Country matcher can be constructed via (iocaine.matcher.Country db_path [Countries...]), where db_path is a path to a Maxmind Country database in mmdb format, and Countries are - optional - two-letter ISO country codes. The Contries are required for using the matches method, without those, only the lookup and within methods are available.
Because the constructor returns a generic Matcher, to use the ASN and Country-specific methods, we’ll need to unwrap them first.
;; At init time
(local ASN (-> (iocaine.matcher.ASN "GeoLite2-ASN.mmdb" 55990)
(: as_asn_matcher)))
(local COUNTRY (-> (iocaine.matcher.Country "GeoLite2-Country.mmdb" "HU")
(: as_country_matcher)))
;; At decision time
(local asn (ASN:lookup (request:header "x-forwarded-for")))
(when (ASN:matches (request:header "x-forwarded-for"))
"garbage")
(local country (COUNTRY:lookup (request:header "x-forwarded-for")))
(when (COUNTRY:matches (request:header "x-forwarded-for"))
"garbage")
Roto
An ASN matcher can be constructed via Matcher.from_asn_db(db_path, asns), where db_path is a path to a Maxmind ASN database in mmdb format, and asns is a StringList of ASNs (without the AS prefix) the matcher should match against.
A Country matcher can be constructed via Matcher.from_country_db(db_path, countries...), where db_path is a path to a Maxmind Country database in mmdb format, and counteis is a StringList of two-letter ISO country codes the matcher should match against.
Because the constructor returns a generic Matcher, to use the ASN and Country-specific methods, we’ll need to unwrap them first.
fn init(metrics: Metrics) -> ()? {
let unwanted_asns = StringList.new();
unwanted_asns.push("55990");
globals.add("ASN", Matcher.from_asn_db("GeoLite2-ASN.mmdb", unwanted_asns)?);
let unwanted_countries = StringList.new();
unwanted_countries.push("HU");
globals.add("COUNTRY", Matcher.from_country_db("GeoLite2-Country.mmdb", unwanted_countries)?);
}
fn decide(request: Request) -> String? {
let asn = ASN.as_asn_matcher()?.lookup(request.header("x-forwarded-for"));
if ASN.matches(request.header("x-forwarded-for")) {
return Some("garbage");
}
let country = COUNTRY.as_country_matcher()?.lookup(request.header("x-forwarded-for"));
if COUNTRY.matches(request.header("x-forwarded-for")) {
return Some("garbage")
}
Some("default")
}
Static matchers
When writing a configurable request handler, with optional features, it is sometimes desirable to have matchers that always fail, or always match, to serve as default if a particular feature of the handler is off. Naturally, iocaine has your back.
Examples
Lua
-- At init time:
local always = iocaine.matcher.Always()
local never = iocaine.matcher.Never()
-- At decision time:
if always:matches(request:header("user-agent")) do
-- this always runs!
end
if never:matches(request:header("user-agent")) do
-- this never runs
end
Fennel
;; At init time:
(local always (iocaine.matcher.Always))
(local never (iocaine.matcher.Never))
;; At decision time:
(when (always:matches (request:header "user-agent"))
;; this always runs!
)
(when (never:matches (request:header "user-agent"))
;; this never runs
)
Roto
# At init time:
globals.add("ALWAYS", Matcher.always());
globals.add("NEVER", Matcher.never());
# At decision time:
if ALWAYS.matches(request.header("user-agent")) {
# This always runs!
}
if NEVER.matches(request.header("user-agent")) {
# This never runs
}
Sec-CH-UA
Chromium-derived browsers support User Agent Client Hints, something we can use for our advantage: a real browser will always send a syntactically correct header, and will include the necessary components too.
Unlike most other matchers, this is mean to be used during decision making, and cannot be pre-constructed, for the simple reason that it requires a Sec-CH-UA header, only available when processing a request.
Lua and Fennel provide the constructor at iocaine.SecCHUA. Roto provides a SecCHUA type, which can be constructed by calling the as_secchua() method of a string.
Examples
Lua
local secchua = iocaine.SecCHUA(request:header("sec-ch-ua"))
if secchua == nil or not secchua:contains_item("Chromium") then
return "garbage"
end
Fennel
(let [secchua (iocaine.SecCHUA (request:header "sec-ch-ua"))]
(when (or (= secchua nil)
(not (secchua:contains_item "Chromium")))
"garbage"))
Roto
let valid_secchua = match request.header("x-forwarded-for").as_sechchua() {
None -> false
Some(v) -> v.contains_item("Chromium")
};
if not valid_secchua {
return Some("garbage");
}
Files and file formats
Our scripts may require auxilliary data, stored in files outside of the script, outside of iocaine. One such case is ai.robots.txt’s robots.json. The scripts run in a limited environment, with no filesystem access, unless explicitly provided by the runtime - and iocaine’s runtime provides those tools.
The environment supports reading files into a string, and serializing/deserializing them to/from tables or maps (the language’s dynamic value type, really).
Language specific types & methods
Lua
The scripting environment provides file access via the iocaine.file table, which has the following methods:
iocaine.file.read_embedded(path): Returns an embedded file’s contents as a string, ornilif the file was not found. Seeiocaine show embedsfor a list of available files.iocaine.file.read_as_string(path): Loads a file from the filesystem, and returns its contents as a string, ornilon error.iocaine.file.read_as_json(path): Loads a file from the filesystem, and tries to deserialize it from JSON, into a Lua value. Returnsnilon error.iocaine.file.read_as_yaml(path): Loads a file from the filesystem, and tries to deserialize it from YAML, into a Lua value. Returnsnilon error.iocaine.file.read_as_toml(path): Loads a file from the filesystem, and tries to deserialize it from TOML, into a Lua value. Returnsnilon error.
In addition, the iocaine.serde table provides serialization and deserialization methods, letting you serialize any Lua value to JSON, YAML, or TOML, or deserialize the same formats into the appropriate Lua value:
iocaine.serde.parse_json(string): Deserializestringfrom JSON, into a Lua value. Returnsnilon error.iocaine.serde.parse_yaml(string): Deserializestringfrom YAML, into a Lua value. Returnsnilon error.iocaine.serde.parse_toml(string): Deserializestringfrom TOML, into a Lua value. Returnsnilon error.iocaine.serde.to_json(value): Serializevalueinto a JSON string. Returnsnilon error.iocaine.serde.to_yaml(value): Serializevalueinto a YAML string. Returnsnilon error.iocaine.serde.to_toml(value): Serializevalueinto a TOML string. Returnsnilon error.
Fennel
The scripting environment provides file access via the iocaine.file table, which has the following methods:
(iocaine.file.read_embedded path): Returns an embedded file’s contents as a string, ornilif the file was not found. Seeiocaine show embedsfor a list of available files.(iocaine.file.read_as_string path): Loads a file from the filesystem, and returns its contents as a string, ornilon error.(iocaine.file.read_as_json path): Loads a file from the filesystem, and tries to deserialize it from JSON, into a Lua value. Returnsnilon error.(iocaine.file.read_as_yaml path): Loads a file from the filesystem, and tries to deserialize it from YAML, into a Lua value. Returnsnilon error.(iocaine.file.read_as_toml path): Loads a file from the filesystem, and tries to deserialize it from TOML, into a Lua value. Returnsnilon error.
In addition, the iocaine.serde table provides serialization and deserialization methods, letting you serialize any Lua value to JSON, YAML, or TOML, or deserialize the same formats into the appropriate Lua value:
(iocaine.serde.parse_json string): Deserializestringfrom JSON, into a Lua value. Returnsnilon error.(iocaine.serde.parse_yaml string): Deserializestringfrom YAML, into a Lua value. Returnsnilon error.(iocaine.serde.parse_toml string): Deserializestringfrom TOML, into a Lua value. Returnsnilon error.(iocaine.serde.to_json value): Serializevalueinto a JSON string. Returnsnilon error.(iocaine.serde.to_yaml value): Serializevalueinto a YAML string. Returnsnilon error.(iocaine.serde.to_toml value): Serializevalueinto a TOML string. Returnsnilon error.
Roto
The scripting environment provides file access via the File type, which has the following methods:
File.read_embedded(path): Returns an embedded file’s contents as a string, orNoneif the file was not found. Seeiocaine show embedsfor a list of available files.File.read_as_string(path): Loads a file from the filesystem, and returns its contents as a string, orNoneon error.File.read_as_json(path): Loads a file from the filesystem, and tries to deserialize it from JSON, into a Value. ReturnsNoneon error.File.read_as_yaml(path): Loads a file from the filesystem, and tries to deserialize it from YAML, into a Value. ReturnsNoneon error.File.read_as_toml(path): Loads a file from the filesystem, and tries to deserialize it from TOML, into a Value. ReturnsNoneon error.
In addition, Roto strings can be deserialized from JSON, YAML, or TOML into Value, and the Value can be serialized back into the same formats too. The following methods are available for strings:
Garbage generation
A big part of iocaine’s power comes from efficient garbage generation. Once we detected the crawlers, we may wish to serve them garbage, and it is important that generating that is cheap: we want to waste their time, not ours. The garbage generators provided by iocaine are designed to be as efficient as possible, while still generating output that’s hopefully useless for training purposes.
Many of the garbage generators require a random number generator as an argument. The reason for that is that iocaine holds no state, but wants to generate predictable output. Given the same inputs, the expectation is to generate the same output every time. For this to work, we need a predictable (but fair-ish) random number generator.
Random numbers
Provided as part of the runtime environment is a predictable random number generator. It can be seeded based on a Request, or a static string. See below for language-specific examples.
The random number generator provides a single method: in_range, to generate a number within a given range.
Examples
Lua
local rng_req = iocaine.generator.Rng:from_request(request, "default")
local rng_seed = iocaine.generator.Rng:from_seed("some random seed")
local number = rng_req:in_range(0, 42)
Fennel
(local rng-req (iocaine.generator.Rng:from_request request "default"))
(local rng-seed (iocaine.generator.Rng:from_seed "some random seed"))
(local number (rng-req:in_range 0 42))
Roto
let rng_req = rng.from_request(request, "default")
let rng_seed = rng.from_seed("some random seed")
let number = rng_req.in_range(0, 42)
Markov chains
One of the textual garbage generators is one based on Markov chains. This generator will generate a paragraph of text, based on the probability of words following each other, as learned from a training corpus. The output is mostly nonsensical, but given a large training set, it produces more sensible output than stringing purely random words together. It usually has more structure, and produces more plausible results.
There are two ways to train a Markov generator: you can either give it a list of files, or ask it to train on the built-in corpus. The built-in corpus is iocaine’s own sourcecode. It’s not a very good corpus, looks vaguely Rust-y, but is very far from human-written text. It’s strongly recommended to use your own corpus. The more the merrier.
Examples
Lua
-- At init time:
local MARKOV = iocaine.generator.Markov("path1", "path2")
local MARKOV_DEFAULT = iocaine.generator.Markov()
-- At output generation time:
local rng = iocaine.generator.Rng:from_request(request, "default")
local paragraph = MARKOV:generate(rng, rng:in_range(8, 64))
Fennel
-- At init time:
(local MARKOV (iocaine.generator.Markov "path1" "path2"))
(local MARKOV-DEFAULT (iocaine.generator.Markov))
-- At output generation time:
(local rng (iocaine.generator.Rng:from_request request "default"))
(local paragraph (MARKOV:generate rng (rng:in_range 8 64)))
Roto
-- At init time:
let sources = StringList.new();
sources.push("path1");
sources.push("path2");
globals.add("MARKOV", MarkovChain.new(sources)?);
globals.add("MARKOV_DEFAULT", MarkovChain.default());
-- At output generation time:
let rng = rng.from_request(request, "default");
let paragraph = MARKOV.generate(rng, rng.in_range(8, 64));
Random words
The other textual garbage generator is a simple word chooser. It can choose a small number of words from a wordlist, completely randomly, and join them with a separator. The intended use case is for generating hrefs and other URL-like things, where we don’t need a Markov chain, just a string of words.
Like the Markov chain, the wordlist generator can be trained on files, or use the built-in source: all the unique words in iocaine’s source code. It is highly recommended to use a better wordlist. Any file will do, it will be split by whitespace, and unique words will become part of the set. For best effect, use a language-specific word list.
Examples
Lua
-- At init time:
local WORDLIST = iocaine.generator.WordList("path1", "path2")
local WORDLIST_DEFAULT = iocaine.generator.WordList()
-- At output generation time:
local rng = iocaine.generator.Rng:from_request(request, "default")
local href = WORDLIST:generate(rng, rng:in_range(1, 3), "-")
Fennel
-- At init time:
(local WORDLIST (iocaine.generator.WordList "path1" "path2"))
(local WORDLIST-DEFAULT (iocaine.generator.WordList))
-- At output generation time:
(local rng (iocaine.generator.Rng:from_request request "default"))
(local href (WORDLIST:generate rng (rng:in_range 1 3) "-"))
Roto
-- At init time:
let sources = StringList.new();
sources.push("path1");
sources.push("path2");
globals.add("WORDLIST", WordList.new(sources)?);
globals.add("WORDLIST_DEFAULT", WordList.default());
-- At output generation time:
let rng = rng.from_request(request, "default");
let href = WORDLIST.generate(rng, rng.in_range(1, 3), "-");
QR Codes
One can generate valid QR codes from within iocaine, either in PNG or SVG format. This is one of the more expensive generators, use it with care! QR codes are generated from a string, and a size - the latter determines the size of the image, which is always a square.
Examples
Lua
local qr_png = iocaine.generator.QRCode.Png("content", 64)
local qr_svg = iocaine.generator.QRCode.Svg("content", 64)
local data = qr_svg:data()
local base64 = qr_png:base64()
Fennel
(local qr-png (iocaine.generator.QRCode.Png "content" 64))
(local qr-svg (iocaine.generator.QRCode.Svg "content" 64))
(local data (qr-svg:data))
(local base64 (qr-png:base64))
Roto
let qr_png = QRCode.generate_png("content", 64)?;
let qr_svg = QRCode.generate_svg("content", 64)?;
let data = qr_svg.as_string(); # or .as_binary()
let base64 = qr_png.as_base64();
Fake JPEGs
Inspired by Alun Jones’ fakejpeg Python library, iocaine lets you generate structurally valid, but otherwise completely fake JPEGs. Unlike QR codes, this is cheap, because the output, while structurally valid, is entirely made up of random junk. Most decoders will happily decode and display them, however, even if they have small errors here and there.
Generating such fakes requires pre-training using the fakejpeg-cli utility, iocaine will not perform the training itself, as it can take a long time. Instead, it relies on fakejpeg-cli’s binary dump of the training data.
To prepare such a file, install the command line tool, find a fair number of JPEGs to train on, and run the training:
# fakejpeg train -o training-data.cbor /path/to/jpegs/*.jpg /path/to/jpegs/*.jpeg
The training-data.cbor file is what iocaine will require. It’s a dump of the structure of the JPEGs it was trained on. It contains no comments, no image data, just the structure, and image sizes.
When generating a fake JPEG, the generator allows setting an optional comment. You can use a Markov Chain to generate said comment, or set a static one, or none at all. The output of the generator is always a binary blob.
Examples
Lua
-- At init time:
local FAKEJPEGS = iocaine.generator.FakeJpeg("training-data.cbor")
-- At output generation time:
local jpeg = FAKEJPEGS:generate("optional comment")
Fennel
;; At init time:
(local FAKEJPEGS (iocaine.generator.FakeJpeg "training-data.cbor"))
;; At output generation time:
(local jpeg (FAKEJPEGS:generate "optional comment"))
Roto
# At init time:
globals.add("FAKEJPEGS", FakeJpeg.new("training-data.cbor")?);
# At output generation time:
let jpeg = FAKEJPEGS.generate("optional comment")
Response generation
When iocaine generates output, it generates an entire HTTP response, with status code, header, body and all. To be able to do that, the scripting environment provides ways to construct a HTTP response, build it up from parts iteratively. It also provides a method to minify said output, if it is HTML or CSS - but iocaine does not verify the content type, it expects the script to minify only when appropriate.
For Lua and Fennel, the constructor is iocaine.Response, for Roto, it is ResponseBuilder.new(). See the examples below!
Examples
Lua
A Response object has the following fields:
response.status: The numeric HTTP response status. This can be read, and set too.response.body: The body of the HTTP response. This can be read and set too.response.content_length: The length of the response body. This is a read-only field.
In addition, the following methods are available on a Response object:
response:header(name): Returns thenameheader, ornilif not found.response:set_header(name, value): Sets thenameheader tovalue.response:set_headers_from(table): Adds all the headers fromtable. It does not remove any headers, it can only add new ones.response:minify(): Minifies the response body. Should be called only if the response is HTML or CSS. Calling it on any other content type is undefined.
function output(request, decision)
local response = iocaine.Response()
response.status = 200
response.body = "hello world!"
response:set_header("content-type", "text/plain")
return response
end
Fennel
A Response object has the following fields:
response.status: The numeric HTTP response status. This can be read, and set too.response.body: The body of the HTTP response. This can be read and set too.response.content_length: The length of the response body. This is a read-only field.
In addition, the following methods are available on a Response object:
(response:header name): Returns thenameheader, ornilif not found.(response:set_header name value): Sets thenameheader tovalue.(response:set_headers_from table): Adds all the headers fromtable. It does not remove any headers, it can only add new ones.(response:minify): Minifies the response body. Should be called only if the response is HTML or CSS. Calling it on any other content type is undefined.
(fn output [request decision]
(local response (iocaine.Response))
(set response.status 200)
(set response.body "hello world")
(response:set_header "content-type" "text/plain")
response)
Roto
A ResponseBuilder object has the following methods:
response.status_code(code): Sets the numeric HTTP response status.response.header(name, value): Sets the headername’s value tovalue.response.content_length(): Returns the size of the response body in bytes.response.body_from_string(body): Sets the body of the response to the given string.response.body_from_binary(body): Sets the body of the response to the given binary blob. This can be used to serve QR code images or fake jpegs for example.response.minify(): Minifies the response body. Should be called only if the response is HTML or CSS. Calling it on any other content type is undefined.response.build(): Builds aResponseout of theResponseBuilder.
The ResponseBuilder has no getters apart from response.content_length(). The status code and response headers can be accessed via a Response object, built with response.build(). The Response object provides the following methods:
final_response.status_code()returns the HTTP response status.final_response.header(name)returns the named header, or an empty string if not found.final_response.body_as_string()returns the HTTP body as lossily serialized UTF8 string.
fn output(request: Request, decision: String?) -> Response? {
let response = ResponseBuilder.new();
response.status_code(200);
response.body_from_string("hello world!");
response.header("content-type", "text/plain");
Some(response.build())
}
Templates
To ease generating textual output, iocaine provides a minimalistic templating system, so you don’t have to string strings together like it’s 1981. The template syntax is jinja-like, you can read more about it here. It works by compiling a template ahead of time, then, whenever you wish to render textual output using it, you give it some context, generated by the handler script, and render it. The context can be any kind of value the language supports.
Examples
Lua
The template engine can created via iocaine.TemplateEngine(). The created engine then has the following methods:
engine:compile(string): Compilesstringto a template, which can be used withengine:render()later. Returnsnilon error.engine:compile_file(path): Compiles the file pointed to bypathto a template, which can be used withengine:render()later. Returnsnilon error.engine:render(template, context): Renders a previously compiledtemplate, using the givencontext. The context can be any type of variable, but it is usually a table.
local ENGINE = iocaine.TemplateEngine()
local TEMPLATE = ENGINE:compile("Hello {{ whom }}!")
function output(request, decision)
local response = iocaine.Response()
response.status = 200
response.body = ENGINE:render(TEMPLATE, {["whom"] = "world"})
response:header("content-type", "text/plain")
return response
end
Fennel
The template engine can created via (iocaine.TemplateEngine). The created engine then has the following methods:
(engine:compile string): Compilesstringto a template, which can be used with(engine:render)later. Returnsnilon error.(engine:compile_file path): Compiles the file pointed to bypathto a template, which can be used with(engine:render)later. Returnsnilon error.(engine:render template context): Renders a previously compiledtemplate, using the givencontext. The context can be any type of variable, but it is usually a table.
(local ENGINE (iocaine.TemplateEngine))
(local TEMPLATE (ENGINE:compile "Hello {{ whom }}"))
(fn output [request decision]
(local response (iocaine.Response))
(set response.status 200)
(set response.body (ENGINE:render TEMPLATE {:whom "world"}))
(response:header "content-type" "text/plain")
response)
Rot
The template engine can created via TemplateEngine.new(). The created engine then has the following methods:
engine.compile(string): Compilesstringto a template, which can be used withengine.render()later. ReturnsNoneon error.engine.compile_file(path): Compiles the file pointed to bypathto a template, which can be used withengine:render()later. ReturnsNoneon error.engine.render(template, context): Renders a previously compiledtemplate, using the givencontext. The context can be any type ofValue, but it is usually aHashMap.
fn init(metrics: Metrics) -> ()? {
let engine = TemplateEngine.new();
globals.add("ENGINE", engine.as_global());
let template = engine.compile("Hello {{ whom }}")?;
globals.add("TEMPLATE", template.as_global());
Some(())
}
fn output(request: Request, decision: String?) -> Response? {
let context = HashMap::new();
context.insert_str("whom", "world");
let response = ResponseBuilder.new();
response.status_code(200);
response.header("content-type", "text/plain");
response.set_body_from_string(ENGINE.redner(TEMPLATE, context))
Some(response.build())
}
Metrics
iocaine itself does not keep metrics, doing so is delegated to the scripts it runs. In practice, this means that when the scripts are initialized, they should set up the counters they wish to use, register them, and - optionally - update them based on previously persisted values.
The scripting environment provides a metric registry, and a set of loaded metrics. For Lua and Fennel, these are available under the iocaine.metrics table. For Roto, the init function is called with a metrics argument, which wraps both the registry, and loaded metrics.
When registering a metric, it will need to have a name, a description, and an optional set of labels.
See the language specific examples below!
Examples
Lua
-- At init time:
local requests = iocaine.metrics.registry:new_counter(
"example_requests", "number of requests received", "host",
)
iocaine.metrics.loaded:update(requests)
local garbage_generated = iocaine.metrics.registry:new_counter(
"example_garbage_generated", "amount of garbage generated (in bytes)", "host"
)
iocaine.metrics.loaded:update(garbage_generated)
-- At a later time time:
requests:inc(request:header("host"))
garbage_generated:inc_by(response.content_length, requests:header("host"))
Fennel
;; At init time:
(local requests
(iocaine.metrics.registry:new_counter
"example_requests" "number of requests received" "host"))
(iocaine.metrics.loaded:update requests)
(local garbage-generated
(iocaine.metrics.registry:new_counter
"example_garbage_generated" "amount of garbage generated (in bytes)" "host"))
(iocaine.metrics.loaded:update garbage-generated)
;; At a later time:
(requests:inc (request:header "host"))
(garbage-generated:inc_by response.content_length (request:header "host"))
Roto
Because Roto does not support variadic arguments, the runtime library supplies a small selection of variants for the metric incrementing functions. counter.inc() will increment a label-les counter, counter.inc1(label) will increment a counter with one label. There are such helpers up until inc_for4(label1, label2, label3, label4). If you have a counter with more labels than four, inc_for(labels) can be used, which takes a StringList as its argument.
Similar helpers exist for inc_by(amount): inc_by_for1(amount, label), and so on.
# At init time:
let requests = metrics.registry().new_counter(
"example_requests", "number of requests received",
StringList.new().push("host")
);
globals.add("METRIC_REQUESTS", requests.as_global());
metrics.loaded().update(requests);
let garbage_generated = metrics.registry().new_counter(
"example_garbage_generated", "amount of garbage generated (in bytes)",
StringList.new().push("host")
);
globals.add("METRIC_GARBAGE_GENERATED", garbage_generated.as_global());
# At a later time:
METRIC_REQUESTS.inc_for1(request.header("host"));
GARBAGE_GENERATED.inc_by_for1(response.content_length(), request.header("host"));
Logging
The runtime environment provides a number of logging functions, for various log levels. Most of these use the tracing crate, and will emit logs using the iocaine::user target. An stdout method is also available, to print the log message to standard output, bypassing the tracing crate.
For Lua and Fennel, logging methods are available in the iocaine.log table: iocaine.log.trace, iocaine.log.debug, iocaine.log.info, iocaine.log.warn, iocaine.log.error, and iocaine.log.stdout. All of these serialize their argument into JSON first.
For Roto, the runtime environment provides the Logger type, with similarly named static methods: Logger.trace, Logger.debug, Logger.info, Logger.warn, Logger.error, and Logger.stdout. Unlike Lua and Fennel, these methods take a string, and do not automatically serialize to JSON.
Examples
Lua
iocaine.log.debug({["hello"] = "world"})
iocaine.log.stdout("Hello world")
Fennel
(iocaine.log.debug {:hello "world"})
(iocaine.log.stdout "Hello world")
Roto
let message = HashMap.new();
message.insert_str("hello", "world");
Logger.debug(message.into_value().to_json()?);
Logger.stdout("Hello world");