Are you looking at a Gatling Test Report for the first time and feeling overwhelmed? Or perhaps you’ve been using it for a while without reaaaally understanding everything on there? Welcome! Let’s take a closer look at what it all means together!

The Importance of Good Data

So, we have a web application and wish to know how it performs under a certain workload. Enter Gatling! The enabler of our desire to exert undue pressure on (hopefully not unsuspecting) systems.

In the grander scheme of Performance Testing the details of how we write our simulation code, whether that be as cleanly implemented pieces of ingenuity or held together by bits of tape, are not so important.*

What matters from a testing perspective is, rather, the quality of the information we receive. We need data in order to analyse and evaluate performance, and this is where reports come in.

*I mean, it still IS important, of course. For maintainability reasons, for a start, but it’s arguably a lot less important than test design and data analysis when it comes to making statements about the system under test.

insight-levels

Gatling is a Tool

Gatling takes measurements, records timings, and presents us with information in a useful way. We might read these reports and use the information to adjust our testing strategy. Reading reports and writing tests can be fairly closely coupled activities, especially if we’re using Gatling to measure responses under normal circumstances before ramping up the pressure.

However, it’s important to remember that Gatling is a tool and cannot, for example, identify whether our simulation includes business critical scenarios or context appropriate assertions.

illu-garb

Our report does not exactly represent our application’s behaviour. The accuracy is, and can only be, dependant on how well we designed our tests. Which paths through the system we included. How realistic the test data was. How representative the load we simulated.

If our simulation or tests are lacking, the report will still look lovely, but won’t be giving us an accurate reading on system behaviour. Or in other words, Garbage In Garbage Out.

Let us continue, then, working under the assumption that our simulation and tests are fairly sensible, and the sticking point is in understanding what our report is telling us about our application!

What on earth does it all mean?

Being presented with all of this (useful, wonderful) data all in one go can be a little overwhelming, at first. If you’re feeling flustered, take a moment to breathe. You’re in the right place, because we’re going to break the report down and go through it together.

report-screenshot-annotated-1-v2
report-screenshot-annotated-2
report-screenshot-annotated-3-v2

The centrepiece of our report is, of course, the data itself. However, it’s very rare for humans to make interesting inferences simply from reading abstract numerical data. Gatling knows this, which is why their reports also have a large visual component.

So we have our tabular data, and various graphical representations. All the other tabs and sections simply help us to narrow our focus by filtering the data to show us smaller subsets (detailed views).

The Report, in numbers

Assertions

  • Assertion: Our assertions are listed in nice readable sentences
  • Status: Whether our assertion passed (OK) or failed (KO)


Statistics

Requests

  • All requests, or groups of requests, that were executed as part of our simulation are listed here. We can click on groups to see the requests contained therein.Wondering where the report pulls the request and group names from?
Report-Code-Names-2

Executions


  • Total: Total times the event* was executed
  • OK: Successful events
  • KO: Failed events
  • % KO: Percentage of failed events
  • Cnt/s: Count of events per second

*Event can refer to either a single request or a group of requests


Response Time


  • Min: The fastest response time
  • Percentiles: How the response time compares with others*
  • Max: The slowest response time
  • Mean: The average response time.
  • StdDev: How far the response times vary from the average*

*Percentiles and Standard Deviation are worth taking a closer look at, more on that later


Errors

  • Error: The error itself (e.g. 500 http response, timeout exception…)
  • Count: How often the error occurred
  • Percentage: The percentage this error comprised of errors overall

Why Averages are misleading

illu-notmaths

It’s tempting, when faced with so many different numbers, to hone in on the average response time. But is it enough to say that our customers have a good experience on average?

Let’s consider this result:

  • 5 users load our shop within 2 seconds
  • 2 users only wait 3 seconds
  • 2 users have to wait 5 seconds
  • and 1 user has to wait for 8 entire seconds

On average, users waited for 3.4 seconds. That’s great, right?
Perhaps if we hadn’t just read the actual breakdown we’d leave it there?

If we say that anything over 3 seconds is too slow, then 30% of our users had a bad experience. The average alone cannot tell us that, and under normal circumstances we won’t be able to see what happened (as in the breakdown above) because we will have data not for ten users but for thousands.

So what information are we missing? It would be useful to know how similar the other response times are to the average…

illu-notmaths2

Don’t be alarmed, Gatling provides us with this information, in the form of Standard Deviation and Percentiles. We don’t need to do the maths ourselves – we just need to understand what these values represent.

illu-notmaths3

Standard Deviation

How are the response times spread out from the average?

A low value here indicates that all responses were close to the average, and a high value indicates that they were more spread out.

If we see a high standard deviation we will know that our application does not respond in a consistent manner, the average does not tell the whole story, and it’s time to investigate this erratic behaviour.

illu-sd

Percentiles

How does a response time compare to all the others?

All of these statements are saying the same thing:


  • Measured at the 90th percentile, the response time is 3 seconds.
  • A response time of 3 seconds is greater than 90% of all responses.
  • 90% of responses were faster than 3 seconds.
  • 10% of responses were slower than 3 seconds.

The larger our user base, the closer to the 100th percentile we want to measure. Whether or not a response time of 3 seconds measured at the 99th percentile is adequate or not will depend on how many end users make up that 1%!

illu-measure

The Report, visually

Our brains process visual data very quickly


Graphical representations of data allow us to identify patterns, trends, and exceptions much more easily than staring at a page of text. It’s no coincidence that Gatling’s reporting is so rich in visual information!

Gatling presents to us not only the statistical data from the table earlier, but additional information that illustrates the simulation and application’s behaviour over time.

Time Series Data is illuminating

Can we draw any conclusions from knowing that there were N failed requests over the course of the simulation? It matters when the failures occurred, what else was happening at the same time, how many users were active at that second, or in the seconds before, and so on.

Customising the Report

You may have noticed that the Indicators graph uses standard ranges to delineate response times. Looking at this chart, and perhaps seeing a very large red bar, without being aware of the range boundaries it’s based upon, may result in unnecessary alarm.

redreport2

Why? Acceptable Performance does not look the same in every context, or for every application. There might be contexts where speed is imperative, and any sluggishness might compromise safety in some way. There are others where a more leisurely pace is expected (and even tolerated).

We don’t judge our bicycle journey to the supermarket against the average speed of a participant in the Tour de France! Context is important for determining whether something was slow or not.

How do we ensure that the convenient traffic light visual uses acceptable ranges? By changing the values in the gatling.conf file.

Load testing is a means to an end

Why is the report not the end of our performance improvement story?

We are, essentially, engaging in a kind of black-box testing. We send a request, and record how long the response takes. What happens inside the application, where exactly in our code, or infrastructure, bottlenecks or failures might be? Unknown.

We want to be able to use our report to identify the moment when performance deteriorates. Or at least to give us some clues for where to focus our follow-up investigations.

If debugging a performance issue is like a criminal investigation, then our Gatling report would be like CCTV footage of the crime. It doesn’t tell the whole story, but it provides us with vital pointers when we’re deciding how to proceed.

illu-magnification

This article was written and illustrated by Constance Armitage, who works as a Test Automation Engineer for Quality Minds.

QM is a young IT company who offer services in the field of QA in agile software development. The team consists of more than 200 experts in software testing and software development, IT architecture and requirements engineering. They advise customers, offer workshops and are “hands-on” on IT projects.