JEE Performance with JMeter, Prometheus and Grafana. Complete Project from Scratch.

The main theme for this blog post will be measuring and monitoring JEE application performance. We will however build a whole solution from scratch. Starting with a simple example, we will go through couple of implementation decisions, trying to guess how they could impact performance. To be able to repeatedly collect measurements, we will set up a test plan in JMeter, and make first observations in VisualVM. Seeing how unrealistic these examinations are, we will build a monitoring solution using Prometheus and Grafana concluded with alerts setup to page us on Slack. All the technical components presented in the blog will make up for a fully automated, docker-based environment, that we will be able to drop and recreate anytime. We will also cover couple of useful details along the way, like implementing bulkheads, usefulness of measures like averages or percentiles, connecting VisualVM to a remote WildFly, and setting up a sequential test plan in JMeter. The post is accompanied by this GitHub project, which you can pull to run the complete solution (this section describes it in details).

Besides being great fun, writing this post helped me a lot to understand what kind of value isolated and laboratory-like performance measurements bring (like the ones done with JMeter), and how this might differ from the monitoring and alerting solutions we can implement on production environments. That is why I invite you to continue reading. Make yourself a solid cup of coffee, and let’s start.

Table of contents

Generators

As an example, and the starting point for this post, we will take a UUIDs generation problem presented in the previous blog post. To recap, we need to efficiently return a batch of UUIDs upon being requested. In the previous blog post I simply assumed, that UUIDs generation should take place in background, so that when a client request arrives, a set of pre-generated values is already waiting to be quickly returned. But is it actually faster than just generating these values in a loop, on demand? Let’s find out. The first generator to compare will be a straightforward one (in GitHub, here).

@Stateless
@Path(value = "straightforward")
public class StraightforwardResource {

    @Inject
    StraightforwardGenerator straightforwardGenerator;

    @GET
    @Produces(APPLICATION_JSON)
    public Response straightforward() {
        return Response.ok(straightforwardGenerator.getNextBatch()).build();
    }
}
@Stateless
public class StraightforwardGenerator {

    public JsonArray getNextBatch() {
        return Stream.of(UuidsGenerator.generateBatch())
                .collect(Json::createArrayBuilder, JsonArrayBuilder::add, JsonArrayBuilder::add)
                .build();
    }
}
public class UuidsGenerator {

    public static String[] generateBatch() {
        String[] idsBatch = new String[BATCH_SIZE];
        for (int i = 0; i < BATCH_SIZE; i++) {
            idsBatch[i] = UUID.randomUUID().toString();
        }
        return idsBatch;
    }
}

We will compare this method of generation with a producer-consumer approach based on a  LinkedBlockingQueue, producing UUIDs in background. And this will come in couple of flavours if you will.

  • One by one – generating a single UUID value at a time, in background, and putting it into a shared queue. This will be our first step that should lead to a better performance. Complete source code is available here.
    @Stateless
    @Path(value = "oneByOne")
    public class OneByOneResource {
    
        @Inject
        OneByOneGenerator oneByOneGenerator;
    
        @GET
        @Produces(APPLICATION_JSON)
        public Response oneByOne() {
            return Response.ok(oneByOneGenerator.getNextBatch()).build();
        }
    
        @GET
        @Produces(APPLICATION_JSON)
        @Path("status")
        public Response status() {
            return Response.ok(oneByOneGenerator.status()).build();
        }
    }
    
    @Singleton
    @ConcurrencyManagement(BEAN)
    @Startup
    public class OneByOneGenerator {
        private final BlockingQueue<String> ids = new LinkedBlockingQueue<>(ONE_BY_ONE_NUMBER_OF_CACHED_IDS);
        private List<Future<?>> futures = new ArrayList<>(ONE_BY_ONE_GENERATION_THREADS);
    
        @Resource(lookup = "java:jboss/ee/concurrency/executor/oneByOneGeneratorExecutor")
        ManagedExecutorService oneByOneGeneratorExecutor;
    
        @PostConstruct
        public void startGeneration() {
            for (int i = 0; i < ONE_BY_ONE_GENERATION_THREADS; i++) {
                futures.add(oneByOneGeneratorExecutor.submit(new OneByOneGenerationThread(ids)));
            }
        }
    
        public JsonArray getNextBatch() {
            try {
                String[] idsBatch = new String[BATCH_SIZE];
                for (int i = 0; i < BATCH_SIZE; i++) {
                    idsBatch[i] = ids.take();
                }
                return Stream.of(idsBatch)
                        .collect(Json::createArrayBuilder, JsonArrayBuilder::add, JsonArrayBuilder::add)
                        .build();
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
    
        public JsonObject status() {
            return Json.createObjectBuilder()
                    .add("size", ids.size())
                    .add("max", ONE_BY_ONE_NUMBER_OF_CACHED_IDS)
                    .build();
        }
    
        @PreDestroy
        public void stopGeneration() {
            futures.stream().forEach(f -> f.cancel(true));
        }
    }
    class OneByOneGenerationThread implements Runnable {
        private final BlockingQueue<String> ids;
    
        OneByOneGenerationThread(BlockingQueue<String> ids) {
            this.ids = ids;
        }
    
        @Override
        public void run() {
            try {
                while (!currentThread().isInterrupted()) {
                    ids.put(UUID.randomUUID().toString());
                }
            } catch (InterruptedException ex) {
                // ignore
            }
        }
    }
  • Batch single queue – generating a batch of UUIDs at a time and putting it into a shared (single) queue. Compared to the one by one generator, what we could count on with this approach would be a significant reduction in congestion on the shared queue. We should therefore expect another performance boost. Full source code is here.
    @Stateless
    @Path(value = "batchSingleQueue")
    public class BatchSingleQueueResource {
    
        @Inject
        BatchSingleQueueGenerator batchSingleQueueGenerator;
    
        @GET
        @Produces(APPLICATION_JSON)
        public Response batchSingleQueue() {
            return Response.ok(batchSingleQueueGenerator.getNextBatch()).build();
        }
    
        @GET
        @Produces(APPLICATION_JSON)
        @Path("status")
        public Response status() {
            return Response.ok(batchSingleQueueGenerator.status()).build();
        }
    }
    @Singleton
    @ConcurrencyManagement(ConcurrencyManagementType.BEAN)
    @Startup
    public class BatchSingleQueueGenerator {
        private final BlockingQueue<String[]> batches = new LinkedBlockingQueue<>(BATCH_SINGLE_QUEUE_NUMBER_OF_CACHED_BATCHES);
        private List<Future<?>> futures = new ArrayList<>(BATCH_SINGLE_QUEUE_GENERATION_THREADS);
    
        @Resource(lookup = "java:jboss/ee/concurrency/executor/batchSingleQueueGeneratorExecutor")
        ManagedExecutorService batchSingleQueueGeneratorExecutor;
    
        @PostConstruct
        public void startGeneration() {
            for (int i = 0; i < BATCH_SINGLE_QUEUE_GENERATION_THREADS; i++) {
                futures.add(batchSingleQueueGeneratorExecutor.submit(new BatchSingleQueuesGenerationThread(batches)));
            }
        }
    
        public JsonArray getNextBatch() {
            try {
                return Stream.of(batches.take())
                        .collect(Json::createArrayBuilder, JsonArrayBuilder::add, JsonArrayBuilder::add)
                        .build();
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
    
        public JsonObject status() {
            return Json.createObjectBuilder()
                    .add("size", batches.size())
                    .add("max", BATCH_SINGLE_QUEUE_NUMBER_OF_CACHED_BATCHES)
                    .build();
        }
    
        @PreDestroy
        public void stopGeneration() {
            futures.stream().forEach(f -> f.cancel(true));
        }
    }
    class BatchSingleQueuesGenerationThread implements Runnable {
        private final BlockingQueue<String[]> batches;
    
        BatchSingleQueuesGenerationThread(BlockingQueue<String[]> batches) {
            this.batches = batches;
        }
    
        @Override
        public void run() {
            try {
                while (!currentThread().isInterrupted()) {
                    batches.put(UuidsGenerator.generateBatch());
                }
            } catch (InterruptedException ex) {
                // ignore
            }
        }
    }
  • Batch multiple queues – generating a batch of UUIDs at a time and putting them into separate queues (each queue belonging to a separate thread). To fetch a generated batch a producer thread (queue) is selected randomly in this case. This should further reduce congestion, this time by the factor equal to the number of threads. Have a look here for a complete source code.
    @Stateless
    @Path(value = "batchMultipleQueues")
    public class BatchMultipleQueuesResource {
    
        @Inject
        BatchMultipleQueuesGenerator batchMultipleQueuesGenerator;
    
        @GET
        @Produces(APPLICATION_JSON)
        public Response batchMultipleQueues() {
            return Response.ok(batchMultipleQueuesGenerator.getNextBatch()).build();
        }
    
        @GET
        @Produces(APPLICATION_JSON)
        @Path("status")
        public Response status() {
            return Response.ok(batchMultipleQueuesGenerator.status()).build();
        }
    }
    @Singleton
    @ConcurrencyManagement(ConcurrencyManagementType.BEAN)
    @Startup
    public class BatchMultipleQueuesGenerator {
        private List<BatchMultipleQueuesGenerationThread> threads = new ArrayList<>(BATCH_MULTIPLE_QUEUES_GENERATION_THREADS);
        private List<Future<?>> futures = new ArrayList<>(BATCH_MULTIPLE_QUEUES_GENERATION_THREADS);
    
        @Resource(lookup = "java:jboss/ee/concurrency/executor/batchMultipleQueuesGeneratorExecutor")
        ManagedExecutorService batchMultipleQueuesGeneratorExecutor;
    
        @PostConstruct
        public void startGeneration() {
            for (int i = 0; i < BATCH_MULTIPLE_QUEUES_GENERATION_THREADS; i++) {
                BatchMultipleQueuesGenerationThread thread = new BatchMultipleQueuesGenerationThread();
                threads.add(thread);
                futures.add(batchMultipleQueuesGeneratorExecutor.submit(thread));
            }
        }
    
        public JsonArray getNextBatch() {
            return Stream.of(aBatchFromRandomThread())
                    .collect(Json::createArrayBuilder, JsonArrayBuilder::add, JsonArrayBuilder::add)
                    .build();
        }
    
        private String[] aBatchFromRandomThread() {
            return threads
                    .get((int) (Math.random() * BATCH_MULTIPLE_QUEUES_GENERATION_THREADS))
                    .getNextBatch();
        }
    
        public JsonObject status() {
            return createObjectBuilder()
                    .add("summary", createObjectBuilder()
                            .add("size", aggregatedQueuesCurrentSize())
                            .add("max", aggregatedQueuesMaxSize()))
                    .add("threads", threads.stream()
                            .map(t -> createObjectBuilder()
                                    .add("size", t.getCurrentSize())
                                    .add("max", BATCH_MULTIPLE_QUEUES_NUMBER_OF_CACHED_BATCHES))
                            .collect(Json::createArrayBuilder, JsonArrayBuilder::add, JsonArrayBuilder::add))
                    .build();
        }
    
        private int aggregatedQueuesCurrentSize() {
            return threads.stream().mapToInt(BatchMultipleQueuesGenerationThread::getCurrentSize).sum();
        }
    
        private int aggregatedQueuesMaxSize() {
            return BATCH_MULTIPLE_QUEUES_GENERATION_THREADS * BATCH_MULTIPLE_QUEUES_NUMBER_OF_CACHED_BATCHES;
        }
    
        @PreDestroy
        public void stopGeneration() {
            futures.stream().forEach(f -> f.cancel(true));
        }
    }
    class BatchMultipleQueuesGenerationThread implements Runnable {
        private final BlockingQueue<String[]> batches = new LinkedBlockingQueue<>(BATCH_MULTIPLE_QUEUES_NUMBER_OF_CACHED_BATCHES);
    
        @Override
        public void run() {
            try {
                while (!currentThread().isInterrupted()) {
                    batches.put(UuidsGenerator.generateBatch());
                }
            } catch (InterruptedException ex) {
                // ignore
            }
        }
    
        String[] getNextBatch() {
            try {
                return batches.take();
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
    
        int getCurrentSize() {
            return batches.size();
        }
    }
  • Cached JSON response – generating an already serialised batch of UUIDs and putting it into a shared queue. This time there should be a gain in performance not only due to a batch approach, but also due to the shift of JSON serialisation processing from the consuming part, to the producing one. On GitHub, here.
    @Stateless
    @Path(value = "cachedJson")
    public class CachedJsonResource {
    
        @Inject
        CachedJsonGenerator cachedJsonGenerator;
    
        @GET
        @Produces(APPLICATION_JSON)
        public Response cachedJson() {
            return Response.ok(cachedJsonGenerator.getNextBatch()).build();
        }
    
        @GET
        @Produces(APPLICATION_JSON)
        @Path("status")
        public Response status() {
            return Response.ok(cachedJsonGenerator.status()).build();
        }
    }
    @Singleton
    @ConcurrencyManagement(ConcurrencyManagementType.BEAN)
    @Startup
    public class CachedJsonGenerator {
        private final BlockingQueue<JsonArray> batches = new LinkedBlockingQueue<>(CACHED_JSON_NUMBER_OF_CACHED_BATCHES);
        private List<Future<?>> futures = new ArrayList<>(CACHED_JSON_GENERATION_THREADS);
    
        @Resource(lookup = "java:jboss/ee/concurrency/executor/cachedJsonGeneratorExecutor")
        ManagedExecutorService cachedJsonGeneratorExecutor;
    
        @PostConstruct
        public void startGeneration() {
            for (int i = 0; i < CACHED_JSON_GENERATION_THREADS; i++) {
                futures.add(cachedJsonGeneratorExecutor.submit(new CachedJsonGenerationThread(batches)));
            }
        }
    
        public JsonArray getNextBatch() {
            try {
                return batches.take();
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
    
        public JsonObject status() {
            return Json.createObjectBuilder()
                    .add("size", batches.size())
                    .add("max", CACHED_JSON_NUMBER_OF_CACHED_BATCHES)
                    .build();
        }
    
        @PreDestroy
        public void stopGeneration() {
            futures.stream().forEach(f -> f.cancel(true));
        }
    }
    class CachedJsonGenerationThread implements Runnable {
        private final BlockingQueue<JsonArray> batches;
    
        CachedJsonGenerationThread(BlockingQueue<JsonArray> batches) {
            this.batches = batches;
        }
    
        @Override
        public void run() {
            try {
                while (!currentThread().isInterrupted()) {
                    JsonArray batch = Stream.of(UuidsGenerator.generateBatch())
                            .collect(Json::createArrayBuilder, JsonArrayBuilder::add, JsonArrayBuilder::add)
                            .build();
                    batches.put(batch);
                }
            } catch (InterruptedException ex) {
                // ignore
            }
        }
    }

Note that for each solution based on a queue, there is another REST status endpoint. It allows for an insight into the state of the queue, to see how many items a given queue contains at the moment compared to its capacity. It will turn out to be quite helpful later in the post.

Bulkheads

Introduction

When running couple of generators to measure performance, we need to make sure they do not affect each other. In particular, we would not like to run them in a single thread pool. What we need to introduce here are bulkheads. In fact, this is more of an approach or a guideline if you will, as the exact implementation depends on the particular problem. It is also very unlikely, that you will have the same logic implemented in couple of variants in your production code. But still, the idea is clear and universal and applies not only to our trivial example. Therefore, what we need to remember, is to always care for clear separation of concerns. Execution of one part of our application/system must not impact other parts.

So even in our simple example, we want to keep each generator isolated. And indeed, as you might have noticed, each queue-based generator is running behind a separate ManagedExecutorService. It is injected with an @Resource annotation, and can be used like in the extract from the one by one generator presented below.

@Singleton
@ConcurrencyManagement(BEAN)
@Startup
public class OneByOneGenerator {
    (...)
    private List<Future<?>> futures = new ArrayList<>(ONE_BY_ONE_GENERATION_THREADS);

    @Resource(lookup = "java:jboss/ee/concurrency/executor/oneByOneGeneratorExecutor")
    ManagedExecutorService oneByOneGeneratorExecutor;

    @PostConstruct
    public void startGeneration() {
        for (int i = 0; i < ONE_BY_ONE_GENERATION_THREADS; i++) {
            futures.add(oneByOneGeneratorExecutor.submit(new OneByOneGenerationThread(...)));
        }
    }
    (...)
}
Defining ManagedExecutorService in WildFly

Before a ManagedExecutorService can be used, it has to be defined in the application server. To automate this process in WildFly, we can use bin/jboss-cli.* executable. In our case, this was automated on the stage of creation of a wildfly-configured docker container. First, the following Dockerfile (available also here) copies initWildfly.sh script into /docker-entrypoint-initmw.d folder in the container, together with the contents of this folder. It is then executed during the creation of the container with the RUN command  defined also in the Dockerfile. What initWildfly.sh script does, is running all *.cli files in /docker-entrypoint-initmw.d folder with bin/jboss-cli.sh executable.

FROM wildfly
MAINTAINER Bartosz Kamiński, http://highcohesionloosecoupling.com
RUN rm ${WILDFLY_HOME}/bin/standalone.conf \
  && ${WILDFLY_HOME}/bin/add-user.sh admin admin --silent
ADD standalone.conf ${WILDFLY_HOME}/bin/
COPY docker-entrypoint-initmw.d/ /docker-entrypoint-initmw.d/
COPY initWildfly.sh /docker-entrypoint-initmw.d/initWildfly.sh
RUN chmod +x /docker-entrypoint-initmw.d/initWildfly.sh \
  && /docker-entrypoint-initmw.d/initWildfly.sh
ENTRYPOINT ${WILDFLY_HOME}/bin/standalone.sh -c standalone-full.xml -b=0.0.0.0 -bmanagement=0.0.0.0
EXPOSE 8080 8787 9990
#!/bin/bash
for SCRIPT in /docker-entrypoint-initmw.d/*.cli
do
  if [ -f $SCRIPT ]
  then
    $WILDFLY_HOME/bin/jboss-cli.sh --file=$SCRIPT
  fi
done

In case you do not need such automation, or you need to quickly run a *.cli file on your local environment, you can simply call it directly. This is how the command looks like on my Windows machine in PowerShell: .\jboss-cli.ps1 --file=0001_managed_executor_services.cli.

Side note: you can also configure everything manually in WildFly Management Interface (Configuration > Subsystems > Subsystem: EE > View > Services > Executor), but this is something I try to always avoid because it is too error prone and tedious thing to do. Automation is usually a much better choice for me.

The *.cli file used in our project is available here. And the command to create a ManagedExecutorService for the one by one generator looks like in the example below.

/subsystem=ee/managed-executor-service=oneByOneGeneratorExecutor:add(jndi-name="java:jboss/ee/concurrency/executor/oneByOneGeneratorExecutor", core-threads=4, hung-task-threshold=0, long-running-tasks=true, keepalive-time=0)

For the detailed description of each attribute, you might want to have a look in this documentation.

Test plan in JMeter

So far, we can open each endpoint to see that it is actually working:

Now, we would like to find out, what is the performance of each generator. We will use JMeter for this. In our first attempt we should measure each endpoint in isolation. Since we do not want to click too much, we will prepare a sequential test plan, which will call each generator endpoint one after another reporting on its throughput. You can download a complete JMeter uuid-generation-sequential.jmx file from here.

There is a full tree of all the steps presented besides. Let’s have a look at couple of key characteristics of the test plan.

Sequential thread groups execution

The test plan was organised into separate thread groups (these are the nodes with an icon of a cog). In order to be able to run them sequentially, you have to select a Run Thread Groups consecutively checkbox in the settings of the test plan (a root in the tree). They will be executed all at once otherwise.

User defined variables for the test plan

To have comparable results, all of the thread groups need to be executed with the same properties (same number of threads and loop count). In order to avoid too much clicking once we decide to adjust these values, we might use test plan variables. We can define them, again, in the test plan settings – in the User Defined Variables section. As you can see, we have defined two variables: NUMBER_OF_THREADS = 8 and LOOP_COUNT = 125 (with the intent to iterate 125 times per thread, meaning 1000 times total, per thread group). By looking for example at the Straightforward thread group, they can be referenced in Thread Properties section with ${NUMBER_OF_THREADS} and ${LOOP_COUNT} expressions.

Reasonable breaks between thread groups

By the nature of each queue-based generator, even after a single thread group is finished, consumers could have drained the queue so that producers might keep on working for some time until a particular queue is full again. Therefore before hitting another generator, we need to make sure that CPU is not busy working on the queue from the previous generator. We take care of this issue in thread groups called Wait to refill … queue. Each such thread group executes a sequence of steps. It starts with a call to a BeanShell PreProcessor step, which resets values of variables used in the sequence: SIZE and MAX. Next, we have a While Controller with a javascript expression that continues the loop until the current size of the queue is not equal to its capacity: ${__javaScript( "${SIZE}" != "${MAX}";)}. Inside the loop, in each iteration, we are first calling a .../status endpoint exposed for each queue-based generator, to extract two values from its output, that could look like {"size":405505,"max":1000000}. Extracted values are assigned to test plan scoped variables in two JSON Extractor steps. In the configuration of JSON Extractor, we need to specify a name of the variable, and a JSON Path expression, within which a dollar character represents the whole response. So to extract a size from the presented output, we need to write $.size expression. Out of courtesy, we can now introduce a short delay, so that our .../status endpoint is not overloaded with requests. We do it in a Constant Timer step. When the loop iterations are over, we add another think time delay in a Test Action and another Constant Timer step. The point of having the latter is to let JVM breathe a little before moving on to the next generator.

Aggregate report

So now it is time to actually run our test plan and have a look at the first results. Let’s open an Aggregate Report and press Start. Here is the output from a single execution (I skipped couple of columns so that the table fits in the page).

Label # Samples Average Median 95% Line Max Throughput
Reference HTTP request 1000 5 5 9 18 1508,3
Straightforward HTTP request 1000 20 17 51 260 344,94653
OneByOne HTTP request 1000 6 6 12 40 1187,64846
Check OneByOne queue status 4 69 2 271 271 2,22346
BatchSingleQueue HTTP request 1000 6 5 11 279 1114,8272
Check BatchSingleQueue status 4 1 1 2 2 2,62467
BatchMultipleQueues HTTP request 1000 6 6 11 271 1020,40816
Check BatchMultipleQueues status 3 1 1 2 2 2,94985
Cached JSON HTTP request 1000 5 5 11 33 1285,34704
TOTAL 5011 9 6 29 279 155,14893
Observations

So what do we see here? First we need to remember, that each generator is called with no think time in between requests (the only pauses come after each generator). Therefore the Throughput column informs about maximal throughput achieved by each generator. A straightforward generator handled 344 requests per second in this test. All queue-based generators were around three times faster. Max response time looks very promising for both one by one and cached JSON generators. Combined with highest throughput, cached JSON looks really promising so far. The point of a reference generator is to show the limit in current machine/network setup. It always responds with the same output, equal in size to the output of other generators. You can find its source code here. Note that JMeter reports on all HTTP requests, therefore we also see all .../status endpoints calls in this report. These we have to ignore, which is little inconvenient when analysing the data.

Misleading percentiles

I guess we all know that an average does not say a lot. We can get the same value from two dramatically different sets. Therefore when measuring response time (and for alerting on response time in particular) they are plain useless. So what kind of a measurement percentiles are? For example, a value of 51 milliseconds for the 95th percentile for a straightforward generator says that the response time for 95% of all observations was below 51 milliseconds. And that 5% of all the cases was worse than that. And, by the way, a median is the same value as the 50th percentile. Still, even being much more powerful technique than measuring averages, let’s have a look at how deceiving percentiles might be. 95% of all cases might seem pretty high, but we need to remember, that with such value we have no clue what the worse cases actually were. And for the batch single queue and batch multiple queues generators, those worse cases were around 25 times worse than the 95th percentile. So even though it was only 5% that misbehaved, depending on a situation this might actually mean quite a lot very disappointed users.

Gil Tene, in his talk How NOT to Measure Latency takes this much further, presenting the following example (it starts at 9:45 minute into the talk, nevertheless I definitely recommend watching it from start to finish). Gil analyses overall user experience within a session that involves 5 page loads, averaging 40 resources per page, and asks a question what is the chance, that in such scenario, a user really experiences the 95th percentile response time. Provided that we treat loading each resource as an independent event with 0.95 probability of making it within the observed response time, the probability that all 200 requests make it equals to 0.95200 = 0,0035%. As we can see, measuring 95th percentile might be actually pretty useless. Gil argues that we might very often need many more nines in the observed percentiles than we might initially think.

VisualVM

This is something definitely worth mentioning, as you will be, or probably already are, using this tool from time to time. It is important when looking for memory leaks, and it is important to see what CPU is actually doing when system monitor roughly reports 800% usage. We will have a brief look at VisualVM features, but first let’s find out how to connect it to a remote WildFly running in a docker container.

Connecting to a remote WildFly

In case WildFly was running locally, VisualVM would automatically detect it. In our case however we need to set up a remote connection to a JMX service on WildFly. And there are two ways we can do this. Following official WildFly documentationthis is switched on by default in standalone mode and accessible over port 9990. And this happens to be the same port that is used by the Management Interface. We need to remember though, that compared to previous JBoss versions, WildFly improves security, and JMX endpoint is password protected now. So even though 9990 management port is already exposed, both Management Interface and JMX require adding a user to the management realm to work. In our setup this is done automatically when wildfly-configured Docker container is being created, according to its (already mentioned) Dockerfile (available here).

FROM wildfly
MAINTAINER Bartosz Kamiński, http://highcohesionloosecoupling.com
RUN rm ${WILDFLY_HOME}/bin/standalone.conf \
  && ${WILDFLY_HOME}/bin/add-user.sh admin admin --silent
ADD standalone.conf ${WILDFLY_HOME}/bin/
(...)

To make a connection from VisualVM, we will use a vendor-specific protocol, expressing the connection URL in the following format: service:jmx:[vendor-specific-protocol]://[host]:[port]. For the host part, when starting my Docker machine, it reports 192.168.99.100. For the vendor-specific-protocol we will take http-remoting-jmx. We also need to add jboss-cli-client.jar to the classpath when starting VisualVM. This is the command I use when starting VisualVM on my Windows machine.

"c:\Program Files\Java\jdk1.8.0_121\bin\jvisualvm.exe" -cp:a c:\Projects\software\wildfly-10.1.0.Final\bin\client\jboss-cli-client.jar

Putting all the above elements together, we are now ready to set up a connection in VisualVM. Right-click on a Remote node in Applications panel, and add a remote host (this was 192.168.99.100 in my case). On this remote host, select Add JMX Connection. Put service:jmx:http-remoting-jmx://192.168.99.100:9990 as a Connection, select Use security credentials checkbox and put admin as Username and Password. Press OK, and we are done. We can now make a connection to JMX exposed by a remote WildFly.

The other way to connect to JMX on WildFly is more conventional. It requires modifying a standalone.conf file.

(...)
#
# Specify options to pass to the Java VM.
#
if [ "x$JAVA_OPTS" = "x" ]; then
   JAVA_OPTS="-Xms4096m -Xmx4096m -Djava.net.preferIPv4Stack=true -Dfile.encoding=UTF-8 "
   JAVA_OPTS="$JAVA_OPTS -Djboss.modules.system.pkgs=$JBOSS_MODULES_SYSTEM_PKGS -Djava.awt.headless=true "
   JAVA_OPTS="$JAVA_OPTS -Xbootclasspath/p:/opt/wildfly-10.1.0.Final/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-2.0.4.Final.jar -Djboss.modules.system.pkgs=org.jboss.byteman,org.jboss.logmanager -Djava.util.logging.manager=org.jboss.logmanager.LogManager "
   JAVA_OPTS="$JAVA_OPTS -Djava.rmi.server.hostname=192.168.99.100 -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=7091 -Dcom.sun.management.jmxremote.rmi.port=7091 -Dcom.sun.management.jmxremote.ssl=false "
else
   echo "JAVA_OPTS already set in environment; overriding default settings with values: $JAVA_OPTS"
fi
(...)

Couple of remarks regarding this approach.

  • Once com.sun.management.jmxremote... parameters are defined, WildFly will complain about a LogManager on startup. This is what line number 8 fixes.
  • In order to avoid setting up another port, com.sun.management.jmxremote.rmi.port has to be explicitly set to the same port as com.sun.management.jmxremote.port parameter.
  • Setting up authentication (disabled in this example) requires specifying additional access and password files.
  • A hostname parameter (java.rmi.server.hostname) has to correspond to the interface, to which VisualVM will be connecting.
  • The first JMX access configuration might be favourable in case WildFly monitoring port is already exposed. Depending on network setup and existing procedures, opening another port on firewalls might be too time consuming or even impossible.
Network situation with Docker

Let’s take advantage of the moment and have a closer look at what our network setup is. Because, to be precise, continuing the first connection method, instead of 192.168.99.100:9990 we could equally well make the JMX connection to localhost:9990 endpoint (settings other than host would remain the same).

So let me explain. Each server is running in a separate Docker container. And by default, no service running in a Docker container is visible to the Docker machine (not to say to a Windows machine), meaning no port is exposed to the Docker machine. For this to happen, we need set up port forwarding on the level of a Docker container, at the moment of Docker container creation.

docker run --name wildfly-configured -d -p 8080:8080 -p 9990:9990 -p 8787:8787 --network jee wildfly-configured

In our example, this command is executed when running this script: up.js (which calls this script: ups.js, with the help of this script: docker.js). And this is the moment, at which we can access port 9990 by connecting to a Docker machine (which was available under  192.168.99.100 address).

Going further, as you will see in this section, when running the whole example, we actually call this script: windowsDockerMachineUp.ps1 first. It works in Windows, and when executed for the first time, it creates a Docker machine in Oracle VirtualBox. It also sets up ports forwarding from Windows machine to Docker machine for all of the most important ports used by servers in the example, 9990 being one of them. And this is what makes a JMX connection working also with localhost:9990 .

Features

Now that we made VisualVM working with our WildFly, let’s have a look at couple of its features. In the Overview tab, there is basic information about the JVM which runs the process, like JVM version, Java Home location and JVM arguments. In the Monitor we can see uptime, overall CPU (with garbage collection listed separately) and memory (heap and metaspace) usage, number of classes and threads. In the Threads tab we can see when and which threads are working, and the Sampler allows collecting performance and memory data according to the specified sampling rates (it works based on analysing thread dumps in intervals). It is also possible to use plugins, among which MBeans and Threads Inspector turn out to be quite useful. The former brings MBeans viewer known from JConsole, and the latter makes it easier to access stack traces of selected threads in the Threads tabs. You will find a list of all available plugins here.

Another tab is the Profiler tab, which allows instrumented profiling of code. It is more accurate than the Sampler, but it might also add significant overhead. Mind that this tab is not visible for remote connections, thus it is not available in our setup. VisualVM needs to be running locally to allow for instrumented profiling. In our case we could start VisualVM in an X server exposed via VNC, and connect to the server with any VNC viewer. For remote, instrumented, profiling we could also use a profiler available in NetBeans. This is mostly the same profiler, as VisualVM downloaded separately or JVisualVM started from JDK installation. The difference is that in case of NetBeans profiler, we are instructed how to prepare additional Remote profiling pack to be executed on a remote machine to allow for attaching remote profiler. Each time running instrumented profiling, remember to limit the scope of instrumented classes to lower the overhead.

Working with Sampler

Our example is pretty simple, therefore we will not make any breaking discoveries. We can however have a look at the Sampler tab for a moment, to get the feeling how useful it might be to better understand how much time our applications spend in particular fragments of code. Development means making many decisions based on assumptions and intuition. Profiling gives us a chance to verify them in action. Here, we will use a Sampler which is a sampling profiler. As we have already noticed, it works based on analysing thread dumps taken periodically. It is indeed less accurate than measurements done on instrumented code. We need to remember though, that code instrumentation just by itself significantly influences observed performance. When applied to an already overloaded application server running larger code base, it might even kill the whole server.

We will be comparing CPU samples for straightforward and cached JSON generators. To limit the noise in this exercise, we will enable the sampler only for com.hclc package (Settings > Profile only packages > com.hclc). Because each generation is expected to take quite short time, we will also change the sampling frequency to 20ms to improve accuracy (Settings > Sampling frequency > 20ms). As a load generator we will use JMeter uuid-generation-distributed-load.jmx file, available here. We will first run a test plan for straightforward, and next, separately, for cached JSON generator. Each one will generate 12000 samples over around 1 minute. The load will be thus distributed, and not concentrated as in the previous measurement. A rate of around 200 requests per second makes a distance safe enough from the maximum capabilities of both generators. Our goal however is to verify the behaviour of the application in a stable state.

These are the sampling results observed in VisualVM.

Straightforward generator
Cached JSON generator

If we look at the time spent in StraightforwardResource, straightforward() method and CachedJsonResource, cachedJson() method, we will see an enormous difference. Self Time is 0 in both cases, which means that these methods do not do anything by themselves, nevertheless the Total Time, which includes further methods calls is around 165 seconds in case of straightforward generator compared to 251 milliseconds in case of cached JSON. We are talking three orders of magnitude here. This is quite consistent with our intuition. After all, cached JSON generator responds only taking a fully prepared message from the queue. As opposed to the straightforward one, which needs to generate 1000 UUIDs and serialize them on each request. Both generators spend significant (and comparable) amount of time in UuidsGenerator, generateBatch() method, which is also expected, as both generators need to produce the same number of results in our tests, after all.

And these are the same results observed in JMeter. What is very important, even though cached JSON generator appears to be faster also here, the difference is not that spectacular. Its response time might be 2-4 times better, but it is not even a single order of magnitude difference.

Label # Samples Median 90% Line 95% Line 99% Line Max Throughput
Straightforward HTTP request 12000 13 44 66 181 689 206,84662
Cached JSON HTTP request 12000 4 13 23 91 161 208,14542

To understand it, we need to remind ourselves how significant overhead of network communication in our case is. And this we can tell by going back to the results of the reference HTTP request measurements. So the discrepancy we are observing here is between the actual response time perceived by JMeter and the service time measured on the server. And it begs for analysis how important this might actually be to even consider performance optimisation of our straightforward generator, knowing that the potential gain might get lost in the network (or any other, for example queuing) overhead. Maybe it is already good enough? Sometimes good enough lets us meet the deadline. But sometimes it might have serious consequences. Like when our server is handling tens or hundreds of requests comparable to the straightforward generator one to fulfil a single business case. In such case improving performance of a single element might leave more computing power for another. The situation would also change if we used API gateway pattern to merge multiple, otherwise small, client requests into a single, larger one. In such case, network overhead applies to all calls at once, and not individually to each request. Shortening service time for couple of such requests could be therefore more apparent in such case.

Bulkheads in action

Staying a little longer in the scope of VisualVM, let’s have a look at the Threads tab. And let’s go back for a while to the bulkheads we have already talked about. We have implemented them by introducing thread pools configured in WildFly as managed executor services. And this is something we can easily monitor in the Threads tab. As you can see below, this is exactly the advantage of using explicitely defined thread pools. Not only gives it a separation of concerns in code, but also makes it easier to visually monitor each thread pool. Especially that the name of each managed executor service defined in WildFly is included in the name of the thread. The screenshot below corresponds to the execution of the sequential JMeter test plan described in this section above.

In our case, the number of threads that are actually submitted into each thread pool is statically defined in the Parameters class. Therefore we will not experience over saturation of the thread pools neither prolonged waiting times due to extensive queuing. These could be however cases in which threads monitoring is even more important to better understand how exactly the system is behaving and which threads are actually working and when.

Monitoring in production

So far we have been analysing behaviour of our code in artificial, unrealistic scenarios. No other system was running on the machine during the measurements and also the load was carefully planned for each test case. This is however very important to realise that even though such observations are important, and do give us some starting point, they tell us exactly nothing about how the system will actually behave in the real world, under real load and real hardware resources utilisation. To address these issues we have no other way than to start monitoring our systems in production.

Monitoring is a very broad topic. We can be monitoring many quantitive characteristics of a system like response times as perceived by the end users, behaviour of the network, hardware resources utilisation on particular nodes (like application server nodes or database nodes) or detailed behaviour of our application including metrics based on business cases. Here we will focus on application-level monitoring. We will instrument our code to observe couple of sample metrics. Obviously we do not have to apply that many metrics, as we will discuss in a moment to all elements of our systems. This is rather to offer couple of ideas of what we can easily measure without too much coding and too significant performance overhead.

Solution overview

Our goal is to constantly monitor performance of our generators. But before we discuss the solution, here are couple of cautionary words. In this exercise, we are applying all metrics to all generators. Well, we might experiment on production (at least for some time), to compare them in the real world. However eventually we would probably pick only one of them. What is more, even though conceptually we are discussing a solution with production in mind, we are still using JMeter as a load generator. Mind that even though we can make couple of interesting observations also with such unrealistic load, the main intention of having JMeter this time is just to demonstrate the solution.

OK, so what’s the solution? We are measuring service time of all our generators at the outermost place we can find in our code: REST endpoints. All the collected observations are next exposed to Prometheus with the help of Prometheus Java client. Prometheus is periodically calling us to fetch latest metrics and is storing them in internal time series database. To visualise the collected data we are using Grafana, which gets it via a built-in Prometheus data source. Even though this alone gives us great monitoring experience, we still need to be actively informed, or paged if you will, whenever our application is misbehaving. For this we are using alerting solution offered by Prometheus, which is configured to page us on Slack.

Instrumenting the code

So let’s start with code instrumentation. Following a Single Responsibility Principle we will try to make our implementation meet the following goals.

  1. We do not want to clutter our original generators’ REST endpoints with monitoring code. Our goal is therefore to limit changes to the monitored code as much as possible.
  2. We need a separate piece of code that will perform actual measurements.
  3. And a separate piece of code that will prepare them to be exposed to the external monitoring system.

To meet the first two goals we might use an interceptor. It can be noninvasively attached to the initial REST endpoints, and it can be used to measure execution time of the code it wraps. Having this time measured and knowing the name of the generator, it can next publish this information as a CDI event. And this event might be eventually processed in the metrics collector (the point of which is to meet the third goal), which observes CDI events. So let’s have a look at the code.

@Stateless
@Path(value = "cachedJson")
public class CachedJsonResource {

    @Inject
    CachedJsonGenerator cachedJsonGenerator;

    @GET
    @Produces(APPLICATION_JSON)
    @Interceptors(GenerationMetricInterceptor.class)
    public Response cachedJson() {
        return Response.ok(cachedJsonGenerator.getNextBatch()).build();
    }

    @GET
    @Produces(APPLICATION_JSON)
    @Path("status")
    public Response status() {
        return Response.ok(cachedJsonGenerator.status()).build();
    }
}
public class GenerationMetricInterceptor {
    private static final double NANO_TO_SECONDS = 0.000_000_001;

    @Inject
    Event<GenerationMetricMessage> generationMetricsChannel;

    @AroundInvoke
    public Object measure(InvocationContext invocationContext) throws Exception {
        long start = System.nanoTime();
        try {
            return invocationContext.proceed();
        } finally {
            generationMetricsChannel.fire(
                    new GenerationMetricMessage(
                            invocationContext.getMethod().getDeclaringClass().getSimpleName(),
                            (System.nanoTime() - start) * NANO_TO_SECONDS
                    )
            );
        }
    }
}
public class GenerationMetricMessage {
    private final String generatorName;
    private final double durationInSeconds;

    public GenerationMetricMessage(String generatorName, double durationInSeconds) {
        this.generatorName = generatorName;
        this.durationInSeconds = durationInSeconds;
    }

    public String getGeneratorName() {
        return generatorName;
    }

    public double getDurationInSeconds() {
        return durationInSeconds;
    }
}
@Singleton
@Startup
public class MetricsCollector {

    @Inject
    Instance<GenerationsMetricCollector> generationsCollectors;

    @Inject
    QueueFillFactorCollector queueFillFactorCollector;

    @PostConstruct
    public void registerCollectors() {
        generationsCollectors.forEach(GenerationsMetricCollector::register);
        queueFillFactorCollector.register();
    }

    public void collectGenerationTimeMetric(@Observes GenerationMetricMessage metricMessage) {
        generationsCollectors.forEach(c -> c.collectGenerationTimeMetric(metricMessage));
    }

    @PreDestroy
    public void unregisterCollectors() {
        CollectorRegistry.defaultRegistry.clear();
    }
}
Collectors

At this point of time, there are four types of metrics available in Prometheus and Prometheus Java client. In this example we are using only two of them: a counter and a gauge. The former can go only up. And the latter can go both up and down, and can also accept a specific value. Even though both these metrics are very simple, they can be used to build a powerful monitoring. Let’s see how we can use them to measure performance of our generators.

Total generations tells us how many times a generator was called.

Generations duration tells us how much time a generator was actually working. More precisely, this is a wall clock time spent in the REST endpoint method. This would correspond to the Total Time in VisualVM. We should remember, that this time might also include waiting time. This waiting time could be for example the time, in which CPU was assigned to a different thread after some generation service started working. And this is actually OK, since we most probably do want to observe service time as close as possible to how user perceives it. We already skip the whole network overhead, so let’s at least include all the distractions taking place while our service is working.

Longest duration tells the value of the longest duration of generation within the last 5 seconds.

Queue fill factor applies only to queue-based generators, and tells the ratio of current number of elements in the queue compared to the capacity of the queue.

Before we have a look at each collector in details, let’s stay with MetricsCollector a little longer. There are however a few useful JEE techniques we can use. First, the collector itself is declared as a @Singleton, as we need only a single point, where metrics are collected. It is also initiated when application is started (on deployment or with server startup in case application is already deployed) due to the @Startup annotation. Each Prometheus collector needs to be registered prior to being used, and this happens also on application startup. Once a @Singleton enterprise java bean is created, a registerCollectors() method is called due to the @PostConstruct annotation. To make the registration and metrics collection processes easier, all instances of GenerationsMetricCollector interface are automatically looked up and injected with the @Inject Instance<GenerationsMetricCollector> expression.

Below you will find the code for each collector. Total generations and generations duration collectors are very simple. The former only increments a value of the counter by one and the latter by the generation time retrieved from a CDI event. Longest duration collector is little more complex. It relies on DelayedObservations auxiliary class, which is responsible for keeping observations only for a specified time. It is built upon a DelayedObservation class that implements java.util.concurrent.Delayed interface. Queue fill factor collector is based on a gauge, and also works in a different style than the collectors mentioned above. Instead of updating a Prometheus collector when the observed event takes place, it exposes a value upon being asked. For this, it forwards a metrics request to generators themselves, to the fillFactor() method exposed by each queue-based generator.

@Singleton
public class TotalGenerationsCollector implements GenerationsMetricCollector {
    private Counter totalGenerationsCounter;

    @Override
    public void register() {
        totalGenerationsCounter = Counter.build()
                .name("generations_total")
                .help("Total number of generations")
                .labelNames("generator_name")
                .register();
    }

    @Override
    public void collectGenerationTimeMetric(GenerationMetricMessage metricMessage) {
        totalGenerationsCounter.labels(metricMessage.getGeneratorName()).inc();
    }
}
@Singleton
public class GenerationsDurationCollector implements GenerationsMetricCollector {
    private Counter generationsDurationCounter;

    @Override
    public void register() {
        generationsDurationCounter = Counter.build()
                .name("generations_duration")
                .help("Total duration of generations")
                .labelNames("generator_name")
                .register();
    }

    @Override
    public void collectGenerationTimeMetric(GenerationMetricMessage metricMessage) {
        generationsDurationCounter.labels(metricMessage.getGeneratorName()).inc( metricMessage.getDurationInSeconds());
    }
}
@Singleton
public class LongestDurationCollector implements GenerationsMetricCollector {
    private static final int TIME_TO_KEEP_OBSERVATION_IN_SECONDS = 5;
    private Gauge longestDurationGauge;
    private HashMap<String, DelayedObservations> observationsPerGeneratorName = new HashMap<>();

    @Override
    public void register() {
        longestDurationGauge = Gauge.build()
                .name("longest_duration_within_last_n_seconds")
                .help("Longest duration within last n seconds")
                .labelNames("generator_name", "n")
                .register();
    }

    @Override
    public void collectGenerationTimeMetric(GenerationMetricMessage metricMessage) {
        DelayedObservations observations = observationsPerGeneratorName.computeIfAbsent(metricMessage.getGeneratorName(), k -> new DelayedObservations(TIME_TO_KEEP_OBSERVATION_IN_SECONDS, SECONDS));
        observations.observe(metricMessage.getDurationInSeconds());
        longestDurationGauge
                .labels(metricMessage.getGeneratorName(), String.valueOf(TIME_TO_KEEP_OBSERVATION_IN_SECONDS))
                .set(observations.max());
    }
}
class DelayedObservations {
    private final long timeToKeepObservation;
    private final ChronoUnit unit;
    private final DelayQueue<DelayedObservation> observations = new DelayQueue<>();

    DelayedObservations(long timeToKeepObservation, ChronoUnit unit) {
        this.timeToKeepObservation = timeToKeepObservation;
        this.unit = unit;
    }

    void observe(double value) {
        observations.drainTo(new LinkedList<>());
        observations.add(new DelayedObservation(value));
    }

    double max() {
        return observations.stream().mapToDouble(DelayedObservation::getObservation).max().orElse(0.0);
    }

    private class DelayedObservation implements Delayed {
        private final double observation;
        private final Instant validUntil;

        DelayedObservation(double observation) {
            this.observation = observation;
            this.validUntil = now().plus(timeToKeepObservation, unit);
        }

        public double getObservation() {
            return observation;
        }

        @Override
        public long getDelay(TimeUnit unit) {
            return unit.convert(SECONDS.between(now(), validUntil), TimeUnit.SECONDS);
        }

        @Override
        public int compareTo(Delayed other) {
            long currentDelay = this.getDelay(TimeUnit.SECONDS);
            long otherDelay = other.getDelay(TimeUnit.SECONDS);
            if (currentDelay > otherDelay)
                return 1;
            if (currentDelay < otherDelay)
                return -1;
            return 0;
        }
    }
}
@Singleton
public class QueueFillFactorCollector {
    private Gauge queueFillFactorGauge;

    @Inject
    OneByOneGenerator oneByOneGenerator;

    @Inject
    BatchMultipleQueuesGenerator batchMultipleQueuesGenerator;

    @Inject
    BatchSingleQueueGenerator batchSingleQueueGenerator;

    @Inject
    CachedJsonGenerator cachedJsonGenerator;

    public void register() {
        queueFillFactorGauge = Gauge.build()
                .name("queue_fill_factor")
                .help("Fill factor of queues")
                .labelNames("generator_name")
                .register();
        registerQueueFillFactorCallback(oneByOneGenerator::fillFactor, OneByOneResource.class.getSimpleName());
        registerQueueFillFactorCallback(batchSingleQueueGenerator::fillFactor, BatchSingleQueueGenerator.class.getSimpleName());
        registerQueueFillFactorCallback(batchMultipleQueuesGenerator::fillFactor, BatchMultipleQueuesGenerator.class.getSimpleName());
        registerQueueFillFactorCallback(cachedJsonGenerator::fillFactor, CachedJsonGenerator.class.getSimpleName());
    }

    private void registerQueueFillFactorCallback(MetricCallback metricCallback, String generatorName) {
        queueFillFactorGauge.setChild(new Gauge.Child() {
            @Override
            public double get() {
                return metricCallback.get();
            }
        }, generatorName);
    }

    interface MetricCallback {
        double get();
    }
}
Exposing metrics to Prometheus server

The way Prometheus works is to periodically scrape configured endpoints. Our endpoint is exposed at http://localhost:8080/uuid-generation-performance/resources/metrics with the following code.

@Stateless
@Path("metrics")
public class MetricsResource {

    @GET
    @Produces("text/plain")
    public Response get() {
        StreamingOutput streamingOutput = outputStream -> {
            Writer bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream));
            try {
                TextFormat.write004(bufferedWriter, CollectorRegistry.defaultRegistry.metricFamilySamples());
                bufferedWriter.flush();
            } finally {
                bufferedWriter.close();
            }
        };

        return Response.ok(streamingOutput).build();
    }
}

It could have been also exposed by adding a simpleclient_servlet maven dependency and configuring this servlet in web.xml, as in the code fragment below.

<web-app ...>
  (...)
  <servlet>
    <servlet-name>metrics</servlet-name>
    <servlet-class>io.prometheus.client.exporter.MetricsServlet</servlet-class>      
  </servlet>
  <servlet-mapping>
    <servlet-name>metrics</servlet-name>
    <url-pattern>/metrics</url-pattern>
  </servlet-mapping>
  (...)
</web-app>

Such configuration would make the endpoint available at http://localhost:8080/uuid-generation-performance/metrics.

Provided that the initial code is really very simple, I prefer that over additional dependency and servlet definition in web.xml. This is not a strong opinion though.

Setting up Prometheus

Prometheus is running in a separate Docker container defined with this Dockerfile. Its first objective is to periodically scrape all configured endpoints to collect metrics. They are saved in Prometheus internal database and analysed according to the optional alerting rules. When some alert is fired (becomes active), it is sent by Prometheus to Alertmanager running as a separate, external service.

FROM prom/prometheus
MAINTAINER Bartosz Kamiński, http://highcohesionloosecoupling.com
ADD prometheus.yml /etc/prometheus/
ADD alerts.rules /etc/prometheus/
CMD [ "-config.file=/etc/prometheus/prometheus.yml", \
      "-storage.local.path=/prometheus", \
      "-web.console.libraries=/usr/share/prometheus/console_libraries", \
      "-web.console.templates=/usr/share/prometheus/consoles", \
      "-alertmanager.url=http://alertmanager-configured:9093" ]

The Dockerfile adds two configuration files and defines an URL of alert manager (which cannot be defined via configuration files at this point of time). A prometheus.yml defines in our case the frequency of evaluating alerting rules (line 2), the name of the file with alerts rules (line 5), and configures scraping (lines 13-17). You will find a full description of available options here, in an otherwise great official documentation.

global:
  evaluation_interval: 5s

rule_files:
  - "alerts.rules"

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'uuid-generation'
    metrics_path: '/uuid-generation-performance/resources/metrics'
    scrape_interval: 1s
    static_configs:
      - targets: ['wildfly-configured:8080']

Prometheus exposes its own UI, available in our setup at http://localhost:9090. It mostly offers quick access to the metrics stored in Prometheus. It is however enough to quickly test a very powerful expression language (described here) and check the results in a table or a simple graph. It is also possible to verify current Prometheus configuration (Status > Configuration) and the state of alerts.

Setting up Grafana

Now that we have metrics collected in Prometheus, it is time to visualise them in Grafana. Grafana is also running in a separate Docker container, according to this definition. This time there is even less happening in the Dockerfile – it only adds a dashboard configuration JSON file to the Docker container.

FROM grafana/grafana
MAINTAINER Bartosz Kamiński, http://highcohesionloosecoupling.com
ADD uuid_generation_performance.json /uuid_generation_performance.json

Grafana is not too Docker-friendly right now. Most of the configuration needs to be done using its API accessed once Grafana is already started. And in order to automatically configure Grafana, we need to perform two operations: configure Prometheus data source and import a dashboard definition. We take advantage of having the whole environment scripted with javascript, and add necessary steps in the ups.js script. A full version is available here, and the most important lines related to Grafana configuration are presented below. As in case of other components of the overall solution, we first need to build a Docker image. Next, we need to start a container. And in case this is the first time and the container has not existed yet, we perform API-based Grafana configuration once the container is up and running. This is done using curl tool sending JSON payload to Grafana endpoints with POST requests, to first configure Prometheus data source and next import a dashboard. You can find full description of Grafana API in this documentation.

(...)
function middlewareUp() {
  (...)
  dockerImages.build('./docker/grafana', 'grafana-configured');
  (...)
  var configureGrafanaFirstTime = !dockerContainers.exists('grafana-configured');
  (...)
  dockerContainers.run('grafana-configured', 'grafana-configured', '-p 3000:3000 --network jee');
  dockerContainers.waitFor('grafana-configured', 'Initializing HTTP Server');
  if (configureGrafanaFirstTime) {
    configureGrafana();
  }
}

function configureGrafana() {
  addPrometheusDataSourceToGrafana();
  importDashboardToGrafana();
}

function addPrometheusDataSourceToGrafana() {
  var prometheusDatasourcesEndpoint = "http://admin:admin@localhost:3000/api/datasources";
  var configurationRequestHeaders = "Content-Type: application/json";
  var configurationPayload = {
    name: "Prometheus",
    type: "prometheus",
    url: "http://prometheus-configured:9090",
    access: "proxy",
    isDefault: true
  };
  configurationPayload = JSON.stringify(configurationPayload).replace(/"/g, "\\\"");
  new Command('./', ['docker', 'exec', '-i', 'grafana-configured', '/bin/sh', '-c', "curl '" + prometheusDatasourcesEndpoint + "' -H '" + configurationRequestHeaders + "' -d '" + configurationPayload + "'"]).execute();
}

function importDashboardToGrafana() {
  var prometheusDashboardsEndpoint = "http://admin:admin@localhost:3000/api/dashboards/db";
  var configurationRequestHeaders = "Content-Type: application/json";
  new Command('./', ['docker', 'exec', '-i', 'grafana-configured', '/bin/sh', '-c', "curl '" + prometheusDashboardsEndpoint + "' -H '" + configurationRequestHeaders + "' -d @/uuid_generation_performance.json"]).execute();
}
(...)

The dashboard was first created manually in Grafana and exported to a JSON file (Share dashboard > Export > Save to file). Before being able to import the file with dashboards API, the file required a few minor modifications: to add a "dashboard" outermost JSON attribute and replace all "${DS_PROMETHEUS}" placeholders occurrences with "Prometheus" specific value.

Dashboard

To access the dashboard defined in our example, we need to first log into Grafana at http://localhost:3000 with user admin and password admin. You can now click on the Home drop down button and select UUID Generation Performance dashboard. To present how the dashboard is behaving, we will use a Random  JMeter thread group from the already known uuid-generation-distributed-load.jmx definition, available here. It calls a special random endpoint (available at http://localhost:8080/uuid-generation-performance/resources/random), which evenly distributes HTTP requests across all available generators. The screenshot below presents the dashboard under sustainable load.

In a moment we will try to break this steady pace by increasing the rate of requests to see how our generators behave in such case. However before we do that, let’s go through a couple of important concepts that make up the whole dashboard. For the complete reference, have a look at the official Grafana documentation available here.

A dashboard is composed of rows, into which we can add panels like graph, singlestat or text. Our dashboard consists of graphs only. For each graph there are multiple options available in graph editor. Starting with the General tab, we can set the title of the graph and decide on its dimensions (how many width units will it take in a row). The Metrics tab is the most important one, as it allows to put an expression used to fetch data from the configured data source. It is important to remember, that this needs to be expressed in a language understood by the service behind a data source, so in our case we need to use Prometheus expression language. In the remaining tabs we can, among others, configure axes, decide what is visible in the legend and change the way the graph itself is displayed.

To understand how graphs are plotted based on Prometheus expressions, it is important to understand two primary data types: instant vector and range vector (explained in details here). The former is a set of values for a particular time series name for a single timestamp, and the latter is the same but for a range of timestamps. So, for example generations_duration expression results in an instant vector of duration values for a single timestamp for each of the recorded generator, while the generations_duration[1m] expression returns 60 duration values for each of the recorded generation (since in our example the scrape interval for uuid-generation job is set to one second). Only instant vector is the data type that can be used in Grafana graphs.  Does it mean that range vectors are useless in our case? No, they are used in Prometheus functions. In our example we are using a rate() function, or to be precise, its irate() variant, which was designed for volatile, fast-moving observations.

With this knowledge we can have a look at the graphs used in our dashboard. Total number of generations, total duration, and queue fill factor are fairly basic. They are based on generations_totalgenerations_duration, and queue_fill_factor expressions respectively. Longest duration within last n seconds, for n=5 is based on longest_duration_within_last_n_seconds{n="5"} expression, where {n="5"} is the selector of a specific series provided that there were more time series available for different values of n. Number of generations per second and duration per second are based on irate(generations_total[15m]) and irate(generations_duration[15m]) expresssions respectively, where [15m] means that irate function should be looking up to 15 minutes back for the two most recent data points to calculate the per-second instant rate of increase of the time series in the range vector (quotes from the documentation to the irate function).

For each graph, we are using {{generator_name}} expression as the legend format in the Metrics tab. It makes the generator_name time series label a description for the series in the graph legend. Provided no such expression was defined, instead of BatchMultipleQueuesGenerator in a legend, we would see a full representation of the series, like: queue_fill_factor{generator_name=”BatchMultipleQueuesGenerator”, instance=”wildfly-configured:8080″, job=”uuid-generation”}.

Another point worth mentioning is that it is very easy to represent in a single graph values that differ by orders of magnitude. As we can see in the picture above, the current value of duration per second for straightforward generator equals 3s, while for reference generator this is 337µs. On a regular graph, this would make all results other than the straightforward one so close to zero, that they would not be even visible. In Grafana however we can easily change the scale of Y axis to a logarithmic one (in graph editor, in Axes tab) to make all values visible in a single graph.

Also note that it is just a matter of selecting the right unit for the Y axis in Grafana to start auto conversion of values. Instead of printing zero most of the time, Grafana displays reasonable values only picking the most suitable unit. We can see this at work again, in the duration per second graph. Our unit of choice is seconds in this case.

Overall, Grafana is an extremely powerful tool. What we have covered so far is only a tip of an iceberg. Its huge feature set is not only related to dashboards, but also topics like users management (which allows us to define separate view only roles). Altogether it adds up to a really impressive monitoring solution.

Stress tests

So far we have gained a good understanding of what we are actually observing. We know the optimisations applied to the generators, details of the metrics gathered in Prometheus and exact rules behind the graphs in the dashboard. We can now challenge our solution. Let’s have a look at the snapshot of the dashboard once we start increasing the load.

In this case, the load was increased by the factor of 1,(6). Technically this was done by shortening a maximum random pause in JMeter Random thread group from 1000ms to 600ms (Random > Think Time > Pause). As we can see, for a short period of time after the load was increased, all generators are stil performing stable. However, by looking at the queue fill factor graph, we can see that producers of UUIDs do not actually keep up with consumption generated by JMeter. As a result, when the cached JSON generator queue is completely drained, the service time of the generator increases significantly underperforming even the straightforward generator. Provided that we would pick cached JSON generator as the production one, this would be definitely something to keep an eye on. By the way, by looking at the results of other queue-based generators, they are all pretty comparable so far.

Alerts

Grafana dashboards are great to visualise what is happening in the system. It would be however quite impractical to have to stare at the dashboard all the time trying to spot problems. This needs to be automated. And we have already started configuring Prometheus to alert us when the system does not perform well. According to the prometheus.yml configuration file, every 5 seconds Prometheus is evaluating rules specified in the alerts.rules file. When some alert becomes active it is sent to Alertmanager, to the endpoint we have specified at the command line starting Prometheus.

Before we get to the Alertmanager, let’s first have a look at the rules defined in Prometheus. Following the advice given in alerting best practices in Prometheus documentation, it is recommended to read My Philosophy on Alerting by Rob Ewaschuk. He distinguishes two types of monitoring: symptom- and cause-based, favouring the former, monitoring from the user perspective. Two of his arguments for avoiding the cause-based monitoring are that you’re going to catch the symptom anyway and once you catch the symptom and the cause, you have redundant alert (quotes from Rob Ewaschuk’s paper). He also states however, that sometimes (…) you want rules to know you’re walking towards a cliff. And hopefully, this is exactly the case of our first alert rule. Even though it is cause-based, it could appear quite useful predicting significant degradation of cached JSON generator performance. Our first rule is activated in case a queue fill factor of this generator drops below 0,5 and remains in this state for at least 5 seconds. When this happens, Prometheus triggers a warning alert.

ALERT UUIDGenerationQueueFillFactorDropped
  IF queue_fill_factor{generator_name="CachedJsonGenerator"} < 0.5
  FOR 5s
  LABELS { severity = "warning" }
  ANNOTATIONS {
    summary = "UUID generation queue fill factor dropped below 0.5 on {{ $labels.instance }} instance.",
    description = "Queue fill factor dropped below 0.5. Expect significant drop in UUID generation performance. Current value of queue fill: {{ $value }}.",
  }

ALERT UUIDGenerationLongResponseTime
  IF longest_duration_within_last_n_seconds{generator_name="CachedJsonResource",n="5"} > 1
  FOR 5s
  LABELS { severity = "severe" }
  ANNOTATIONS {
    summary = "Response time of UUID generator on {{ $labels.instance }} instance is longer than 1s.",
    description = "Response time of UUID generation extended significantly. Longest response time from last 5 seconds: {{ $value }}.",
  }

The second one is the symptom-based one. It generates a severe alert in case Prometheus observes longest duration of generation greater than 1 second for at least 5 seconds. Note how we can use double curly brackets placeholders to dynamically generate summary and description.

While Prometheus is responsible for evaluating alert rules, Alertmanager is responsible for grouping them and forwarding to appropriate recipients according to the defined channels. Available types of recipients include e-mail, Slack, generic HTTP endpoint and multiple paging APIs. Additionally we can use features like inhibition (as explained in Alertmanager documentation it is a concept of suppressing notifications for certain alerts if certain other alerts are already firing) and silences (a straightforward way to simply mute alerts for a given time). It is also possible to define routing trees, for which there is even a visual editor available.

What we cover here is a simplistic scenario that uses a global route to forward all of the alerts to a fictitious high cohesion loose coupling (hclc in short) team which uses a Slack workspace. To be able to post messages from the outside, we need to configure an incoming webhook in Slack which requires creating a Slack app in our workspace first. After successful configuration, Slack offers us a webhook URL for our workspace. We can also set a custom icon for the messages posted with this Slack app. The webhook URL together with the name of the Slack channel need to be used in Alertmanager configuration file. This is the file used in our project.

route:
  group_by: ['alertname']
  group_wait: 1s
  group_interval: 5m
  repeat_interval: 3h 
  receiver: team-hclc-slack

receivers:
- name: 'team-hclc-slack'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/8JDD53M81/8XL193DRQ/k95TYann51CX04HLRid3nb3b'
    channel: 'alerts'
    title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
    text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"

Deployment-wise, Alermanager is running in a separate container defined by this Dockerfile.

FROM prom/alertmanager
MAINTAINER Bartosz Kamiński, http://highcohesionloosecoupling.com
ADD config.yml /etc/alertmanager/config.yml

It is really simple. It only adds above mentioned config.yml file overriding the default one. There is no additional configuration required. When the Docker container is started, Alertmanager UI can be accessed at http://localhost:9093.

This is what we could see in Slack in the stress test scenario discussed above.

GitHub project

You will find a full, executable source code accompanying the post in this github project. A complete, scripted Docker environment is described here. I was running the project on Windows 10 Pro, Oracle VirtualBox and Docker Toolbox. To start the environment, run:

  1. .\windowsDockerMachineUp.ps1 script to create a dedicated Docker machine in Oracle VirtualBox (in a PowerShell window with Administrator privileges), and
  2. jjs .\up.js to create docker images and containers, and to compile and deploy Java project.

The benefit of having a Docker environment is ease of recreating a fresh container for each of the servers used in this post. There is a bunch of *.bat scripts to reload containers. This one, for example, reloads WildFly server in the following steps.

  1. docker rm -f wildfly-configured,
  2. docker rmi wildfly-configured,
  3. jjs .\up.js.

The whole Docker scripts setup was also used in my previous post and is based on my other GitHub Links project. For details of that setup have a look here. All these setups are heavily influenced by the Docklands project by Adam Bien.

Conclusion

By no means does this project cover all aspects of performance measurement and monitoring. On the contrary, the point of this post was to present only couple of techniques, nonetheless diversified. It was however also the point that they may be applied on different stages of a complete application building process. I mean a proces focused on performance.

Therefore we first started with a simple UUIDs generation example analysing performance of couple of possible design decisions. Next, we built a test plan in JMeter, which was producing load for all our test cases. At this point of time we did a short overview of how we can use VisualVM tool to analyse internal behaviour of JVM. We stated that although interesting and to some degree valuable, our measurements were not realistic. At this point of time we instrumented our code and configured Prometheus to collect metrics exposed by our application. We used Grafana to display a nice looking dashboard. And we also configured Prometheus to page us on Slack.

And that’s it! It turns out that going though all these steps related to measuring and monitoring performance was not that hard. At least in my case, the project allowed me to draw couple of interesting conclusions – which I tried to share with you along the way. Simplicity of Java Enterprise Edition surrounded with tools like Prometheus and Grafana offers great technical foundation for building reliable systems. And having this ground covered, the remaining work is to actually build these system. Good luck!

Site Footer

Sliding Sidebar

About Me

About Me

My name is Bartosz Kaminski. I am a software engineer, looking for simple and elegant software design achieved by great teams. My professional interests revolve around clean code, Java EE, microservices, DDD, distributed systems, motivation and building great teams.

Social Profiles