APM metrics: complete guide for performance testing teams

Diego Salinas
Enterprise Content Manager
Table of contents

APM metrics: complete guide for performance testing teams

APM metrics are the quantifiable measurements that track your application's health, speed, and efficiency—covering response times, error rates, throughput, and resource utilization across your entire stack. They're what stand between you and the 3 AM phone call about production being down.—with downtime costing over $300,000 per hour for most organizations.

This guide covers the core metrics every performance testing team should track, how infrastructure and trace metrics fit into the picture, and how to connect your load testing results directly to production monitoring.

What are APM metrics

APM (Application Performance Monitoring) metrics are quantifiable measurements that track the health, speed, and efficiency of software applications. They focus on four core areas:

  • Response time
  • Error rates
  • Throughput
  • Resource utilization

APM tools collect these measurements continuously across your entire application stack—from frontend interfaces to backend services and underlying infrastructure. The goal is straightforward: spot problems before users do. When response times creep up or error rates spike, APM metrics give you the data to investigate and fix issues quickly.

Why APM metrics matter for performance testing teams

Here's something useful to know: load testing tools and APM platforms track the same core metrics. Response times, throughput, error rates, latency percentiles—they're identical whether you're running a Gatling simulation or monitoring production traffic in Datadog.

That overlap creates a direct connection between testing and production. When your load test shows a p95 latency of 200ms under 1,000 concurrent users, you can compare that number directly against what your APM tool reports in production. If production latency suddenly jumps to 350ms, you have a concrete reference point for investigation.

Without this shared vocabulary, performance testing happens in isolation. Teams run tests, see results, and hope those numbers translate to real-world behavior. With APM metrics as your common language, you can validate assumptions and catch regressions before they reach users.

Essential application performance monitoring metrics to track

Application-layer metrics form the foundation of any monitoring strategy. They measure what your code is actually doing, independent of the servers running it.

Apdex score

Apdex (Application Performance Index) translates raw response times into a standardized satisfaction score between 0 and 1. You define a threshold—say, 500ms—and the formula categorizes every response as satisfied, tolerating, or frustrated based on how it compares to that threshold.

The score is particularly useful for communicating with stakeholders who don't want to interpret percentile charts. An Apdex of 0.94 means "most users are happy." An Apdex of 0.67 means "we have a problem." Many teams use Apdex thresholds directly in their SLAs.

Response time and latency percentiles

Average response time can be misleading. If 95% of your requests complete in 100ms but 5% take 3 seconds, your average might look acceptable while thousands of users experience frustration.

Percentiles tell the full story:

  • p50 (median): The typical user experience—half of all requests are faster than this value
  • p95: What slower requests look like—only 5% of users experience worse performance
  • p99: The worst-case scenarios, excluding extreme outliers—critical for understanding your most impacted users

When setting performance goals, p95 and p99 matter more than averages. They reveal the experience of users who might otherwise leave without complaining.

Request rate and throughput

Throughput measures capacity: how many requests your application handles per second (RPS) or per minute (RPM). This metric answers fundamental questions about scale.

Can your checkout service handle 500 transactions per second during a flash sale? What happens when traffic doubles? Throughput trends also reveal problems—a sudden drop might indicate upstream failures, while unexpected spikes could signal bot traffic or a viral moment.

Error rate

Error rate tracks failed requests as a percentage of total requests. A 0.1% error rate sounds small until you realize that's 1,000 failures per million requests.

The metric becomes most valuable when correlated with other signals. Low latency with high errors might indicate fast failures—your service is rejecting requests quickly. High latency with rising errors often points to timeouts or resource exhaustion.

Infrastructure metrics for application performance

Application metrics tell you what's happening. Infrastructure metrics help explain why. When response times spike, these measurements point toward root causes.

CPU and memory utilization

CPU utilization above 80% sustained often indicates a performance bottleneck. Your application might be doing too much work per request, running inefficient algorithms, or simply undersized for current traffic.

Memory pressure creates different symptoms. Gradual increases suggest memory leaks. Sudden spikes might indicate large payload processing or cache misses. When memory runs low, applications start swapping to disk or triggering aggressive garbage collection—both devastating for latency.

Garbage collection metrics

For applications running on managed runtimes like the JVM (Java, Scala, Kotlin), garbage collection directly impacts user experience. During GC pauses, your application literally stops processing requests.

Track GC frequency and duration. Minor collections happening constantly suggest your application creates too many short-lived objects. Major collections taking hundreds of milliseconds will show up as latency spikes in your p99 metrics.

Instance count and availability metrics

Uptime percentage measures reliability—99.9% availability still means 8.7 hours of downtime per year. For critical services, even 99.99% might not be enough.

Instance count matters in auto-scaling environments. If your application scales from 3 to 15 instances during peak traffic, that's useful capacity planning data. If it scales to 15 instances and still struggles, you've found a bottleneck that horizontal scaling can't solve.

APM trace metrics and transaction monitoring

With 85% of organizations adopting microservices, modern applications rarely exist as monoliths. A single user request might touch roughly 35 interconnected components spanning services, databases, and external APIs. Trace metrics follow that journey.

Distributed trace metrics

A trace captures the complete path of a request through your system. Each step—a service call, a database query, a cache lookup—becomes a span with its own timing data.

When a checkout request takes 2 seconds, traces show you exactly where that time went. Maybe 1.5 seconds happened in a single database query. Maybe latency accumulated across 20 microservice hops. Without traces, you're guessing. With them, you know precisely which component to optimize.

Database query performance metrics

Slow queries cause more performance problems than almost any other factor. A single unoptimized query running on every request can bring down an entire application.

Key database metrics to watch:

  • Query execution time: Both average and p95, broken down by query type
  • Connection pool utilization: Running out of connections causes requests to queue
  • Lock contention: Queries waiting on locks indicate concurrency issues

Adding an index or rewriting a join often delivers 10x improvements with minimal code changes.

End user experience monitoring metrics

Server-side metrics capture what your infrastructure experiences. Real User Monitoring (RUM) captures what actual users experience in their browsers—and the two can differ dramatically.

Page load time

A server might respond in 50ms, but the user's browser still takes 3 seconds to render the page. Network latency, asset loading, JavaScript execution, and rendering all add up.

Key components include Time to First Byte (TTFB), First Contentful Paint (FCP), and Largest Contentful Paint (LCP). These metrics often reveal optimization opportunities invisible to backend monitoring—uncompressed images, render-blocking scripts, or CDN misconfigurations.

User session metrics

Session duration, bounce rates, and conversion funnels connect technical performance to business outcomes. A 500ms increase in page load time might correlate with a measurable drop in conversions.

This connection helps prioritize performance work. Optimizing a page that 80% of users visit delivers more value than perfecting a rarely-used admin screen.

How to connect load testing results to APM metrics

Load testing and APM work best together. One validates performance before deployment; the other monitors it afterward. The metrics they share make this partnership possible.

Establishing performance baselines before production

Load tests create controlled conditions for measuring performance. Run a test with 1,000 concurrent users, and you know exactly what your p95 latency looks like at that load level.

These baselines become your reference points. When APM shows p95 latency climbing in production, you can compare against your test results. Is current traffic higher than what you tested? Did a recent deployment change performance characteristics?

Correlating test throughput with production traffic

Effective load tests simulate realistic conditions. If production handles 200 RPS during normal hours and 800 RPS during peaks, your tests can cover both scenarios.

APM data tells you what "realistic" actually means. Pull traffic patterns from your monitoring tools, then replicate those patterns in your load tests.

This approach catches problems that synthetic, steady-state tests miss—like race conditions that only appear during traffic ramps.

Using APM metrics as load test assertions

Modern load testing tools support pass/fail criteria based on metrics. You can configure tests to fail if p95 latency exceeds 500ms or error rate climbs above 1%.

Gatling integrates directly with APM platforms like Datadog and Dynatrace, streaming test metrics alongside production data. This unified view lets you compare test runs against production baselines in the same dashboard.

How to choose the right application metrics for your stack

Not every metric matters equally for every application. Your architecture and business requirements determine which measurements deserve attention.

Performance priorities by application type METRICS • GUIDE
Application type Priority metrics
Web applications Page load time, Apdex score, error rate
APIs & microservices Latency percentiles (p95/p99), throughput, distributed trace metrics
Data-intensive apps Database query time, GC metrics, memory utilization
Real-time systems p99 latency, connection metrics, availability

Start with the four golden signals—latency, traffic, errors, and saturation—then add specificity based on what your users care about. An e-commerce site might prioritize checkout latency. A real-time collaboration tool might focus on p99 message delivery times.

Connecting load testing to observability platforms

Load testing becomes significantly more valuable when its metrics flow into your observability stack.

Gatling Enterprise Edition supports integrations with major platforms, allowing teams to correlate synthetic load with real infrastructure signals.

Datadog

With the Datadog integration, you can stream load test metrics directly into Datadog dashboards. This allows you to overlay test windows with infrastructure metrics, helping you identify exactly when latency increased and which components were affected.

Dynatrace

The Dynatrace integration enables correlation between load test traffic and distributed traces. You can tag test-generated requests and analyze them at code level, making microservice bottlenecks visible under synthetic stress.

New Relic

With New Relic, you can centralize load testing and APM analysis in one place. Test runs appear alongside production telemetry, making regression comparison straightforward.

InfluxDB

Teams using InfluxDB can push load test metrics into time-series databases and visualize them in Grafana. This is particularly useful for long-term trend analysis and custom dashboards.

OpenTelemetry

OpenTelemetry provides a vendor-neutral way to export metrics and traces. Integrating load testing into OpenTelemetry pipelines ensures your synthetic traffic participates in the same observability architecture as your production systems.

Using APM metrics as CI/CD gates

Performance should not be evaluated manually after deployment, especially if you're implementing CI/CD performance automation.

Modern teams define acceptance criteria directly in their pipelines, turning performance testing into a release gate rather than a reporting exercise. Gatling Enterprise Edition supports run stop criteria and SLA thresholds.

For example:

  • Fail a build if p95 exceeds 500 ms
  • Stop a test if error rate rises above 2%
  • Abort execution if injector CPU exceeds safe limits

From monitoring to continuous performance visibility

Catching performance issues in production is reactive. Catching them during load testing is proactive. Catching them inside CI is preventative.

When load testing integrates with your APM system, performance becomes observable across the entire lifecycle.

This shift aligns directly with how large enterprises modernize performance engineering Instead of running isolated load tests, teams build continuous performance visibility.

Turn APM metrics into continuous performance visibility

APM metrics become most valuable when they're part of a continuous strategy rather than occasional checkups. Catching issues in production is good. Catching them during load testing is better. Catching them in CI/CD before merge is best.

Teams using Gatling can stream load test metrics directly to their APM platforms, creating a single view of performance from development through production. The same dashboards that monitor production can also display test results, making comparisons immediate and obvious.

Explore Gatling Enterprise to see how continuous performance visibility works in practice.

{{card}}

FAQ

What does APM stand for in application monitoring?

APM stands for Application Performance Monitoring. It's the practice of tracking and optimizing how software applications perform in real-time, using metrics, traces, and logs to maintain visibility across the entire application stack.

What is the difference between APM metrics and infrastructure monitoring metrics?

APM metrics focus on application behavior—response times, error rates, and transaction traces. Infrastructure monitoring tracks underlying resources like CPU, memory, network, and disk. Most APM platforms collect both, since application problems often have infrastructure root causes.

What is the difference between APM metrics and load testing metrics?

APM metrics measure production application behavior with real user traffic. Load testing metrics measure behavior under simulated traffic in controlled environments. Both track similar KPIs—response time, throughput, error rate—which makes them complementary tools for comprehensive performance visibility.

Can APM metrics predict performance issues before they affect users?

Yes, when combined with proper baselines and alerting. Rising latency trends, increasing error rates, or growing resource utilization often signal problems before they become critical. Teams that set alerts on leading indicators can intervene proactively rather than reacting to user complaints.

Ready to move beyond local tests?

Start building a performance strategy that scales with your business.

Need technical references and tutorials?

Minimal features, for local use only