Monitoring reactive projects with Kamon.io

Published in

Powerspace Engineering

6 min readFeb 12, 2016

A project life is long and risky in an ad tech company like Powerspace. It is scattered with numerous production deployments, many features, new technologies. Whatever the skills of engineers are, the possibility of a failure exists at each production operation. When you serve millions of requests each day, you have to be careful. Being fast does not matter much if your code is not safe.

Some context

Some times ago, I have read some interesting experience feedback from a former finance company named Knight Capital Group. You may have heard of it. In 2012, they released a new High-Frequency Trading algorithm that was badly configured. For more than 45 minutes, they delivered erroneous stock orders, that literally collapsed the New York Stock Exchange. This generated a $440M loss and led the company to collapse.

The question is: how do you stay safe when you also want to be fast and “High-Frequency-savvy” ? On the JVM platform, many frameworks and programming practices are implementations of what is called reactive programming. Reactive programming obeys to a manifesto: The Reactive Manifesto. That one was often explained but one property of reactive systems caught my eye: Resiliency.

You don’t get resilient by magic, you get resilient because you measure. You measure when you lose a database, a network connection, or when you cannot send data to a Big Data ingestion pipeline. This allows the system to take decisions and stay reactive, staying away from the no-return point. Additionally, you reach resiliency as you automate everything. In a computing world, every possible repeated manual tasks can lead to human mistakes, and to lack of reactivity. Consequently, the more things you automate, the more resilient you are. In other words: be DevOps.

At Powerspace we tried some monitoring frameworks, and we recently came across an interesting one: kamon.io. We chose it because we are already using Akka, and because Kamon offers good support with this stack. While Kamon team offer pretty good tutorials and extensive documentation we also wanted to share this experience.

What exactly is Kamon ?

Kamon is a monitoring system for the JVM. it was built to monitor fast applications which deliver sustained performance. It works in two mode: AspectJ instrumentation, and explicit tracing.

AspectJ is a framework whose goal is to add aspects to an already existing application, without modifying its code. it is called Aspect Oriented Programming. In our case, aspects will be used to add monitoring to our application. Some other Java softwares work this way, like Dynatrace or New Relic.
Explicit tracing is when you decide to declare performance metrics in the code source of your application. It is interesting when you want to polish monitoring of your application. Some other Java software work this way like codahale metrics.

Installing Kamon

Kamon can be installed on any piece of Java software, but we decided to use SBT. Kamon can also use multiple metric collections, and various backend services.

Our stack will be the following:

Our software will use SBT and Java 8. We can collect such useful metrics as Thread.sleep()’s usage.
Our stack will use statsd as a collection system.
And because simplicity is the ultimate sophistication, we will use Zabbix.

So here is the code of our fabulous finance application :-)

Some java code

package com.powerspace.kamon;
import kamon.annotation.EnableKamon;
import kamon.annotation.Count;/**
* Powerspace code created on 22/01/16.
*/@EnableKamon ❶
public class FastHighFrequencyTradingSoftwareThatIsMonitored {
  public static void main(String[] args) throws InterruptedException {
    KamonTest test = new KamonTest();
    while(true) {
      System.out.println(String.format(“My long operation yielded %s”, test.longOperation()));
    }
  }  @Count(“longOperation”)❷
  public int longOperation() throws InterruptedException {
    Thread.sleep(2000);
    return 1;
  }
}

❶ is the annotation which will tell Kamon to activate metrics collection.
❷ is the annotation which will tell Kamon to monitor our lengthy operation.

And then some sbt (scala) code

enablePlugins(SbtNativePackager)
enablePlugins(JavaAppPackaging)
name := “kamon-powerspace”
version := “1.0”
scalaVersion := “2.11.7”libraryDependencies ++= Seq(
  “io.kamon” %% “kamon-core” % “0.5.2”,
  “io.kamon” %% “kamon-annotation” % “0.5.2”,
  “io.kamon” %% “kamon-statsd” % “0.5.2”,
  “io.kamon” %% “kamon-log-reporter” % “0.5.2”, ❶  //aspectj weaver
  “org.aspectj” % “aspectjweaver” % “1.8.8”, ❷
  “ch.qos.logback” % “logback-classic” % “1.1.3”
)javacOptions ++= Seq(“-source”, “1.8”)
packageDescription := “The powerspace kamon thing”mainClass in (Compile) := Some(“com.powerspace.kamon.FastHighFrequencyTradingThatIsMonitored”)mappings in Universal ++= Seq(
  findJarFromUpdate(“aspectjweaver”, update.value) ->    
    “aspectj/aspectjweaver.jar”
)bashScriptExtraDefines ++= Seq(“addJava -javaagent:${app_home}/../aspectj/aspectjweaver.jar”)javaOptions in Universal += s”-Dkamon.auto-start=true” ❹def findJarFromUpdate(jarName: String, report: UpdateReport): File = { ❸  val filter = artifactFilter(name = jarName + “*”, extension = “jar”)
  val matches = report.matching(filter)  if (matches.isEmpty) {
    val err: (String => Unit) = System.err.println
    err(“can’t find jar file in resources named “ + jarName)
    err(“unfiltered jar list:”)
    report.matching(artifactFilter(extension = “jar”)).foreach(x => err(x.toString))    throw new ResourcesException(“can’t find jar file in resources named “ + jarName)  } else {
    matches.head
  }
}

❶ It is the list of Kamon package we are going to use.

The annotation package which we are using inside Java code.
The module which will stream to statsd
The module which will stream all metrics to the program console

❷ It is the aspectj weaver. The aspectj weaver is the tool which transforms the bytecode in order to do supervision.
❹ It is the scala method which crawl all jar packages and returns the aspectj one.
❺ It is the execution flags which will be added to java command in order to:
* automatically launch kamon monitoring

Discovering the result

When we launch it, it yields some impressive results:

powerspace$ sbt dist && unzip target/universal/kamon-powerspace-1.0.zip && \
sh kamon-powerspace-1.0/bin/kamon-powerspace
[main] [StatsDExtension(akka://kamon)] Starting the Kamon(StatsD) extension
[main] [LogReporterExtension(akka://kamon)] Starting the Kamon(LogReporter) extension
[main] [AnnotationExtension(akka://kamon)] Starting the Kamon(Annotation) extension
My long operation yielded 1
My long operation yielded 1
My long operation yielded 1
My long operation yielded 1
[kamon-akka.actor.default-dispatcher-4] [akka://kamon/user/kamon-log-reporter]
+ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -+
| Counters
| — — — — — — -
| longOperation => 4
|+ — —— — — — — — — — — — — — — — — — — — — — — — — — — — — — — -+

What does this mean ? We launched our impressive “high-frequency” program. It triggered the LogReporter module, the Annotation module, and the StatsD module. We made 4 long operations and the LogReporter reported 4 long operations.

But it is not finished yet. Let’s do some network dumping.

powerspace$ sudo tcpdump -nnvvXSs 1514 -i lo0 port 8125
Password:
tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 1514 bytes
19:38:28.488951 IP (tos 0x0, ttl 64, id 58658, offset 0, flags [none], proto UDP (17), length 79, bad cksum 0 (->9779)!)
127.0.0.1.60413 > 127.0.0.1.8125: [bad udp cksum 0xfe4e -> 0xd916!] UDP, length 51
0x0000: 4500 004f e522 0000 4011 0000 7f00 0001 E..O.”..@…….
0x0010: 7f00 0001 ebfd 1fbd 003b fe4e 6b61 6d6f ………;.Nkamo
0x0020: 6e2e 4c61 7572 656e 7473 2d69 4d61 635f n.powerspace.
0x0030: 6c6f 6361 6c2e 636f 756e 7465 722e 6c6f local.counter.lo
0x0040: 6e67 4f70 6572 6174 696f 6e3a 347c 6363 ngOperation:4|c

Our impressive High-Frequency-Trading system is now sending UDP frames to the port 8125.

About etsy/statsd

The port 8125 is a statsd daemon. It is used to collect application statistics of all monitored operations. We saw in the network dump that it used UDP. This is because UDP unleash some interesting patterns.

First, it does not require a connection.
Second, it is fast.
Third, it does not make our High-Frequency-Trading program slower.

About zabbix

We used the excellent (and open-source) statsd-zabbix-backend npm module. From this we are able to display everything in a slick Zabbix interface.

However using zabbix is not the only way. You can also use Grafana and Datadog.

An example of Grafana dashboard

Conclusion

What you measure is what you get. The whole conclusion of that blog post lays behind this simple expression.
Do not wait: Automate. Read the metrics. Coding excellent code is great, but your company is at risk if you don’t measure well.

By the way, if you like good code, we are hiring.