Roto in iocaine reference guide

  Important

This documentation is a work in progress, and while the hope is that it provides an adequate reference of the functionality the Roto engine provides, it does not teach you the language. For the time being, look at the See Also section for examples to learn from.

This reference documents the iocaine 2.5.1 release. Some of the functionality is not available in earlier versions.

Overview

Starting with iocaine 2.2.0, it is possible to configure custom request handler, in multiple languages. The first and most performant one is Roto, a statically typed, compiled language. It is fast, efficient, but also more limited than the other options, and certainly less familiar.

The main entry point of the script is a decide function, which takes a single argument, the incoming request, and returns a verdict with the Outcome. A minimal implementation that accepts everything to serve it garbage (mimicking the default behaviour without a custom request handler) looks like this:

function decide(request: Request) -> Verdict[Outcome, Outcome] {
  accept Outcome.garbage()
}

It is also possible to run a function once, on startup, to pre-compile patterns and regexes for example, so that work won’t need to be performed for each and every request. To faciliate this, iocaine looks for an init function:

function init() -> Verdict[Unit, String] {
  accept
}

As roto does not support global variables, if you want to store anything you compiled at init() time, those need to be put into one of the variables the Roto engine makes available for scripts - see below!

Types and variables available in the runtime

There are a number of types and variables exposed in the request handler runtime, largely matching the core features provided by all engines. These are documented below.

Request

Each request iocaine handles will pass through the decide function, which takes a single parameter, of the Request type. This is a read-only representation of the incoming request, and has the following methods:

Additionally, for logging purposes, the Request type can convert its headers, cookies, and query parameters into metadata using the request.headers_into_metadata(<metadata>, <prefix>), request.cookies_into_metadata(<metadata>, <prefix>), and request.queries_into_metadata(<metadata>, <prefix>) functions. When inserting into the metadata map, each header, cookie, or query name will be prefixed by <prefix>.

if request.query("log-me") == "yes" {
  request.headers_into_metadata(metadata, "request.header.");
}

The above snippet will add request headers to the metadata map, each header prefixed with request.header.. As such, the user-agent header will, in this case, end up as request.header.user-agent.

Outcome

An Outcome - combined with the verdict whether to accept or reject a request - is the result of the decision the request handler makes, and is used as the return value of the decide() function. It can be Outcome.garbage(), Outcome.challenge(), or Outcome.not_for_us().

The former two are to be used with an accept verdict, while the latter with reject. Using accept with Outcome.not_for_us(), or reject with anything other than that results in an internal server error. Don’t do that.

Metadata

The Metadata type is a key-value map, primarily used during logging. This makes it possible to attach arbitrary key-value pairs to log messages.

To construct them, two functions are available:

To add new keys to the map, use metadata.with(<key>, <value>) - which will return the metadata instance itself, so it can be chained. However, chaining will eventually exhaust the stack, so only do that if you have a handful of pairs to add.

Should you wish to retrieve the value of a key, metadata.get(<key>) will let you do that. It returns the value if the key is found, or an empty string if it is not.

MutableStringList

Because Roto does not support lists, nor variadic arguments, if any function needs to receive a list of arguments, those arguments need to be constructed first, using a specific type of their own. For list of strings, that’s MutableStringList. It is not advised to use these outside of init()-time setup.

For construction purposes, the following functions can be used:

let list = MutableStringList.new();
list.push("foo").push("bar");

It is also possible to join the elements of a string list with a separator: list.join(<separator>) will do just that:

let list = MutableStringList.new();
list.push("foo").push("bar");

let result = list.join(", "); # result == "foo, bar"

iocaine_patterns

To match substrings effectively, the Roto engine provides access to the iocaine_patterns variable. This - under the hood - is a name-value map, allowing you to collect patterns into a “variable”: Roto does not support global variables, so to be able to refer to things we compiled at init() time, we have to put them into an iocaine-provided variable, such as iocaine_patterns.

Under the hood, each key has an associated PatternFinder, which does the bulk of the work, the actual matching. To insert finders into the map, you can use either of these two functions:

Which one to use, depends on circumstances. These are generally run at init() time, and the performance is the same, whether you turn a string list into a finder, or iocaine does so under the hood. Use whichever feels more natural.

To retrieve a key, one should use iocaine_patterns.get(<key>). If a key is not found, this returns a finder that will always fail.

PatternFinder

A PatternFinder is the AhoCorasick instance that does the actual pattern matching. An instance of it can be directly constructed from a MutableStringList with PatternFinder.new(<string-list>), or indirectly through iocaine_patterns.insert_patterns(). Once constructed, or retrieved via iocaine_patterns.get(), it provides one method only:

iocaine_regexes

For regular expression matching, the Roto engine provides access to the iocaine_regexes variable. This - under the hood - is a name-value map, allowing you to collect compiled regexes into a “variable”: Roto does not support global variables, so to be able to refer to things we compiled at init() time, we have to put them into an iocaine-provided variable, such as iocaine_regexes.

Note that this variable allows inserting a single regexp into the map! If you wish to match multiple, possible overlapping regexes against the same string, use iocaine_regexsets instead, that is much more efficient. This variable should only be used if you want to extract capture groups, or there’s only a single expression in the set.

Under the hood, each key has an associated RegexFinder, which does the bulk of the work, the actual matching, and capture group extraction, if need be. To insert finders into the map, you can use either of these two functions:

Which one to use, depends on circumstances. These are generally run at init() time, and the performance is the same, whether you turn a string list into a finder, or iocaine does so under the hood. Use whichever feels more natural.

To retrieve a key, one should use iocaine_regexes.get(<key>). If a key is not found, this returns a finder that will always fail.

RegexFinder

A RegexFinder is the Regex instance that does the actual pattern matching. An instance of it can be directly constructed from a string with RegexFinder.new(<string>), or indirectly through iocaine_regexes.insert_regex(). Once constructed, or retrieved via iocaine_regexes.get(), it provides the following methods:

iocaine_regexsets

Available since iocaine 2.5.0.

Similar to iocaine_regexes, iocaine_regexsets provides an efficient way to match multiple regexes against a single string. The major difference between the two is that iocaine_regexsets allows matching multiple regexes in a single pass, while iocaine_regexes can match only one - but it can also extract capture groups, which regex sets can’t.

This is most useful when you don’t need to extract captures, and you have multiple sets of regular expressions you wish to match.

The variable provides two methods:

The finder has a single method:

iocaine_networks

Available since iocaine 2.3.0.

The iocaine_networks variable allows one to store named network sets they can later use to compare IP addresses against. This makes it possible to efficiently check whether an IP address falls into a given range, or range sets (for example, an entire ASN!).

The variable has two methods:

To match an IP address against a set of networks, retrieve the finder via iocaine_networks.get(), and use its finder.contains(<ip-addr>) to do the matching. This function takes an IP address, and returns true if the address is part of any of the networks contained within the finder, false otherwise.

Roto natively supports IP addresses, one can use them as literals:

if iocaine_networks.get("local").contains(127.0.0.1) {
  reject Outcome.not_for_us()
}

PrefixList

Because Roto does not support lists, nor variadic arguments, if any function needs to receive a list of arguments, those arguments need to be constructed first, using a specific type of their own. For list of IP ranges in CIDR notation, this specific type is PrefixList. It is not advised to use these outside of init()-time setup.

For construction purposes, the following functions can be used:

let list = PrefixList.new();
list.push("192.168.0.0/24").push("10.0.0.0/16");

metrics

Available since iocaine 2.3.0.

The metrics variable provides access to the iocaine_request_handler_hits metric, and has a single method: .inc(<id>).

This will increase the metric with the <id> label by one. The intended use is to create script-specific metrics, for example, to count how many times a given ruleset was hit.

Logger

Logger provides a primitive framework for logging. It lets one build a map of key-value pairs, and output them in JSON format to standard output. It also contains a message, and information about the outcome of evaluation.

There are multiple ways of constructing a Logger object:

An existing Logger instance has the following methods:

Env

With Env.get(<name>), a request handler script can retrieve the value of the <name> environment variable (or an empty string, if the variable isn’t set). This can be used to allow some limited configuration of a reusable request handler, through environment variables.

JSON parsing

The Roto engine provides very limited support for loading and working with JSON files: you can load a file, and extract the keys from it into a string list:

let ai_robots_txt = Json.load_file("/some/path/robots.json").get_keys();
iocaine_patterns.insert_patterns("ai.robots.txt", ai_robots_txt);

Other extensions to the Roto language

There are a few small things the Roto runtime within iocaine provides for types native to Roto: for example, it is possible to split a string into a list with the split_by(<delimiter>) method:

let list = "foo/bar".split_by("/"); # list = ["foo", "bar"]

Since iocaine 2.5.0, it is also possible to try and parse a string as a Sec-CH-UA header:

let secchua = request.header("sec-ch-ua").as_secchua();
if secchua.contains_item("Chrome") {
  accept outcome.Garbage()
}

As can be seen in the example, if the string is successfully parsed, the returned object will have a .contains_item(<key>) method, which does what its name suggests.

Testing

  Important

Documenting the testing support is a work in progress. Please look at examples in the See Also section for more information for the time being.

Roto supports embedding tests right in the Roto scripts using the following pattern:

test name_of_the_test {
  # do some checks
  accept
}

The tests should accept when passing, and reject when failing.

See also

For a complete example, see Nam-Shub of Enki, the bot detection & classification system of iocaine’s author. It’s a complete example used in production with great success.

For smaller examples, there are Roto tests in the source code, too. Very basic, not particularly great as a starting point, but they perhaps provide an overview of the language.

There’s also a short Getting Started guide, with a complete request handler example in Roto. More useful than the tests as a starting point, less complex and featureful than Nam-Shub of Enki.